Training: 2022-01-19 17:01:23,058-rank_id: 0 Training: 2022-01-19 17:01:37,193-: loss arcface Training: 2022-01-19 17:01:37,194-: network r100 Training: 2022-01-19 17:01:37,194-: resume False Training: 2022-01-19 17:01:37,194-: output work_dirs/ms1mv3_r100_lr02 Training: 2022-01-19 17:01:37,194-: embedding_size 512 Training: 2022-01-19 17:01:37,194-: sample_rate 1.0 Training: 2022-01-19 17:01:37,194-: fp16 True Training: 2022-01-19 17:01:37,194-: momentum 0.9 Training: 2022-01-19 17:01:37,194-: weight_decay 0.0005 Training: 2022-01-19 17:01:37,194-: batch_size 128 Training: 2022-01-19 17:01:37,194-: lr 0.2 Training: 2022-01-19 17:01:37,194-: dali False Training: 2022-01-19 17:01:37,194-: verbose 2000 Training: 2022-01-19 17:01:37,194-: frequent 10 Training: 2022-01-19 17:01:37,194-: score None Training: 2022-01-19 17:01:37,194-: rec /train_tmp/ms1m-retinaface-t1 Training: 2022-01-19 17:01:37,194-: num_classes 93431 Training: 2022-01-19 17:01:37,194-: num_image 5179510 Training: 2022-01-19 17:01:37,194-: num_epoch 25 Training: 2022-01-19 17:01:37,194-: warmup_epoch 0 Training: 2022-01-19 17:01:37,194-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-01-19 17:01:37,195-: warmup_step 0 Training: 2022-01-19 17:01:37,195-: total_step 126450 Training: 2022-01-19 17:02:44,674-Reducer buckets have been rebuilt in this iteration. Training: 2022-01-19 17:02:50,588-Speed 3353.03 samples/sec Loss 49.5924 LearningRate 0.1999 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-19 17:02:53,589-Speed 3413.52 samples/sec Loss 53.4721 LearningRate 0.1999 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-01-19 17:02:56,575-Speed 3431.15 samples/sec Loss 54.8683 LearningRate 0.1999 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-19 17:02:59,545-Speed 3447.99 samples/sec Loss 55.3376 LearningRate 0.1998 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-19 17:03:02,489-Speed 3480.11 samples/sec Loss 55.1316 LearningRate 0.1998 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-19 17:03:05,470-Speed 3436.11 samples/sec Loss 56.0799 LearningRate 0.1998 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-19 17:03:08,447-Speed 3441.55 samples/sec Loss 55.6759 LearningRate 0.1997 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-19 17:03:11,368-Speed 3506.56 samples/sec Loss 54.1533 LearningRate 0.1997 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-19 17:03:14,348-Speed 3437.05 samples/sec Loss 54.0587 LearningRate 0.1997 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-19 17:03:17,354-Speed 3408.67 samples/sec Loss 53.3182 LearningRate 0.1997 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-19 17:03:20,306-Speed 3469.88 samples/sec Loss 52.7126 LearningRate 0.1996 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-19 17:03:23,257-Speed 3470.86 samples/sec Loss 52.2235 LearningRate 0.1996 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-19 17:03:26,244-Speed 3429.23 samples/sec Loss 51.1238 LearningRate 0.1996 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-19 17:03:29,262-Speed 3394.70 samples/sec Loss 50.8232 LearningRate 0.1995 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-19 17:03:32,187-Speed 3501.77 samples/sec Loss 50.2285 LearningRate 0.1995 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-19 17:03:35,111-Speed 3504.05 samples/sec Loss 49.9001 LearningRate 0.1995 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:03:38,052-Speed 3481.74 samples/sec Loss 49.4527 LearningRate 0.1994 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:03:41,032-Speed 3437.87 samples/sec Loss 49.4278 LearningRate 0.1994 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:03:43,990-Speed 3462.50 samples/sec Loss 48.8459 LearningRate 0.1994 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:03:47,036-Speed 3363.73 samples/sec Loss 48.5904 LearningRate 0.1993 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:03:49,960-Speed 3503.02 samples/sec Loss 48.3992 LearningRate 0.1993 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:03:52,896-Speed 3487.99 samples/sec Loss 48.1070 LearningRate 0.1993 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:03:55,865-Speed 3450.37 samples/sec Loss 47.8285 LearningRate 0.1992 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:03:58,822-Speed 3463.83 samples/sec Loss 47.6321 LearningRate 0.1992 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:04:01,799-Speed 3440.53 samples/sec Loss 47.3745 LearningRate 0.1992 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:04:04,793-Speed 3421.68 samples/sec Loss 47.1594 LearningRate 0.1991 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:04:07,716-Speed 3503.98 samples/sec Loss 46.9380 LearningRate 0.1991 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:04:10,646-Speed 3496.30 samples/sec Loss 46.8114 LearningRate 0.1991 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:04:13,607-Speed 3458.27 samples/sec Loss 46.5176 LearningRate 0.1991 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:04:16,545-Speed 3487.67 samples/sec Loss 46.4501 LearningRate 0.1990 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:04:19,496-Speed 3470.47 samples/sec Loss 46.2412 LearningRate 0.1990 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:04:22,420-Speed 3502.90 samples/sec Loss 46.0005 LearningRate 0.1990 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:04:25,346-Speed 3501.32 samples/sec Loss 45.9919 LearningRate 0.1989 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:04:28,272-Speed 3499.77 samples/sec Loss 45.7875 LearningRate 0.1989 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:04:31,198-Speed 3501.06 samples/sec Loss 45.5450 LearningRate 0.1989 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:04:34,115-Speed 3511.47 samples/sec Loss 45.4848 LearningRate 0.1988 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:04:37,030-Speed 3514.18 samples/sec Loss 45.4393 LearningRate 0.1988 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:04:39,959-Speed 3497.65 samples/sec Loss 45.1942 LearningRate 0.1988 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:04:42,876-Speed 3511.38 samples/sec Loss 45.1643 LearningRate 0.1987 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:04:45,795-Speed 3508.62 samples/sec Loss 44.9606 LearningRate 0.1987 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:04:48,726-Speed 3494.53 samples/sec Loss 44.9269 LearningRate 0.1987 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:04:51,648-Speed 3505.86 samples/sec Loss 44.7850 LearningRate 0.1986 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:04:54,586-Speed 3487.17 samples/sec Loss 44.7740 LearningRate 0.1986 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:04:57,619-Speed 3376.96 samples/sec Loss 44.5582 LearningRate 0.1986 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:05:00,605-Speed 3430.12 samples/sec Loss 44.4205 LearningRate 0.1985 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:05:03,539-Speed 3490.59 samples/sec Loss 44.3638 LearningRate 0.1985 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:05:06,482-Speed 3480.53 samples/sec Loss 44.2414 LearningRate 0.1985 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:05:09,431-Speed 3472.50 samples/sec Loss 44.1074 LearningRate 0.1985 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:05:12,438-Speed 3407.55 samples/sec Loss 44.0151 LearningRate 0.1984 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:05:15,435-Speed 3417.78 samples/sec Loss 43.9563 LearningRate 0.1984 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 524288 Required: 11 hours Training: 2022-01-19 17:05:18,450-Speed 3397.34 samples/sec Loss 43.9274 LearningRate 0.1984 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 524288 Required: 11 hours Training: 2022-01-19 17:05:21,387-Speed 3487.58 samples/sec Loss 43.7457 LearningRate 0.1983 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 524288 Required: 11 hours Training: 2022-01-19 17:05:24,299-Speed 3516.86 samples/sec Loss 43.6820 LearningRate 0.1983 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:05:27,244-Speed 3478.85 samples/sec Loss 43.6474 LearningRate 0.1983 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:05:30,167-Speed 3504.01 samples/sec Loss 43.5112 LearningRate 0.1982 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:05:33,091-Speed 3502.85 samples/sec Loss 43.3520 LearningRate 0.1982 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:05:36,040-Speed 3474.62 samples/sec Loss 43.2248 LearningRate 0.1982 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:05:38,961-Speed 3506.65 samples/sec Loss 43.0608 LearningRate 0.1981 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:05:41,946-Speed 3431.13 samples/sec Loss 43.0319 LearningRate 0.1981 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:05:44,890-Speed 3479.55 samples/sec Loss 43.0575 LearningRate 0.1981 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:05:47,842-Speed 3470.07 samples/sec Loss 42.9067 LearningRate 0.1980 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:05:50,777-Speed 3489.49 samples/sec Loss 42.8240 LearningRate 0.1980 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:05:53,719-Speed 3482.02 samples/sec Loss 42.7323 LearningRate 0.1980 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:05:56,701-Speed 3434.41 samples/sec Loss 42.5545 LearningRate 0.1979 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:05:59,690-Speed 3427.71 samples/sec Loss 42.5254 LearningRate 0.1979 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:06:02,664-Speed 3443.20 samples/sec Loss 42.4383 LearningRate 0.1979 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:06:05,603-Speed 3485.45 samples/sec Loss 42.3070 LearningRate 0.1979 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:08,519-Speed 3512.94 samples/sec Loss 42.1634 LearningRate 0.1978 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:11,436-Speed 3511.72 samples/sec Loss 42.1107 LearningRate 0.1978 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:14,400-Speed 3455.38 samples/sec Loss 42.0176 LearningRate 0.1978 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:17,322-Speed 3506.14 samples/sec Loss 41.8977 LearningRate 0.1977 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:20,244-Speed 3505.30 samples/sec Loss 41.8917 LearningRate 0.1977 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:23,162-Speed 3509.26 samples/sec Loss 41.6907 LearningRate 0.1977 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:26,082-Speed 3508.16 samples/sec Loss 41.6282 LearningRate 0.1976 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:29,033-Speed 3471.23 samples/sec Loss 41.4646 LearningRate 0.1976 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:32,000-Speed 3452.83 samples/sec Loss 41.3710 LearningRate 0.1976 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:34,948-Speed 3474.07 samples/sec Loss 41.3405 LearningRate 0.1975 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:37,886-Speed 3486.38 samples/sec Loss 41.2437 LearningRate 0.1975 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:40,815-Speed 3497.67 samples/sec Loss 41.1015 LearningRate 0.1975 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:43,737-Speed 3505.21 samples/sec Loss 40.9676 LearningRate 0.1974 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:46,671-Speed 3491.46 samples/sec Loss 40.8784 LearningRate 0.1974 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:06:49,607-Speed 3487.74 samples/sec Loss 40.8402 LearningRate 0.1974 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:06:52,548-Speed 3482.84 samples/sec Loss 40.5871 LearningRate 0.1974 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:06:55,476-Speed 3498.31 samples/sec Loss 40.5105 LearningRate 0.1973 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:06:58,407-Speed 3494.50 samples/sec Loss 40.5330 LearningRate 0.1973 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:07:01,341-Speed 3491.48 samples/sec Loss 40.2249 LearningRate 0.1973 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:07:04,312-Speed 3446.66 samples/sec Loss 40.2429 LearningRate 0.1972 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:07:07,254-Speed 3482.53 samples/sec Loss 40.1224 LearningRate 0.1972 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:07:10,190-Speed 3488.67 samples/sec Loss 40.0119 LearningRate 0.1972 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:07:13,127-Speed 3487.20 samples/sec Loss 39.8710 LearningRate 0.1971 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:07:16,095-Speed 3452.05 samples/sec Loss 39.7371 LearningRate 0.1971 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:07:19,064-Speed 3448.79 samples/sec Loss 39.5668 LearningRate 0.1971 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:22,040-Speed 3442.25 samples/sec Loss 39.4527 LearningRate 0.1970 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:24,994-Speed 3467.78 samples/sec Loss 39.2433 LearningRate 0.1970 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:27,927-Speed 3492.17 samples/sec Loss 39.2076 LearningRate 0.1970 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:30,859-Speed 3493.67 samples/sec Loss 39.0644 LearningRate 0.1969 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:33,840-Speed 3435.55 samples/sec Loss 39.0449 LearningRate 0.1969 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:36,839-Speed 3416.03 samples/sec Loss 38.7994 LearningRate 0.1969 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:39,822-Speed 3433.15 samples/sec Loss 38.6051 LearningRate 0.1968 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:42,744-Speed 3505.93 samples/sec Loss 38.5101 LearningRate 0.1968 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:45,706-Speed 3457.85 samples/sec Loss 38.4295 LearningRate 0.1968 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:48,624-Speed 3510.19 samples/sec Loss 38.3236 LearningRate 0.1968 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:51,612-Speed 3427.83 samples/sec Loss 38.1279 LearningRate 0.1967 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:54,637-Speed 3386.54 samples/sec Loss 37.9762 LearningRate 0.1967 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:07:57,568-Speed 3494.07 samples/sec Loss 37.9239 LearningRate 0.1967 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:00,525-Speed 3463.90 samples/sec Loss 37.5742 LearningRate 0.1966 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:03,482-Speed 3463.59 samples/sec Loss 37.4356 LearningRate 0.1966 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:06,412-Speed 3496.19 samples/sec Loss 37.3079 LearningRate 0.1966 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:09,408-Speed 3419.55 samples/sec Loss 37.2058 LearningRate 0.1965 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:12,354-Speed 3476.43 samples/sec Loss 37.0652 LearningRate 0.1965 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:15,276-Speed 3504.83 samples/sec Loss 36.8940 LearningRate 0.1965 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:18,200-Speed 3503.26 samples/sec Loss 36.5589 LearningRate 0.1964 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:08:21,193-Speed 3421.93 samples/sec Loss 36.6539 LearningRate 0.1964 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:08:24,130-Speed 3487.29 samples/sec Loss 36.3826 LearningRate 0.1964 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:08:27,130-Speed 3414.98 samples/sec Loss 36.2610 LearningRate 0.1963 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:08:30,086-Speed 3464.38 samples/sec Loss 36.0213 LearningRate 0.1963 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:08:33,018-Speed 3493.47 samples/sec Loss 35.8079 LearningRate 0.1963 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:08:35,958-Speed 3483.75 samples/sec Loss 35.7953 LearningRate 0.1963 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:08:38,910-Speed 3470.22 samples/sec Loss 35.5701 LearningRate 0.1962 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:08:41,825-Speed 3514.49 samples/sec Loss 35.2269 LearningRate 0.1962 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:44,829-Speed 3409.21 samples/sec Loss 35.2352 LearningRate 0.1962 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:47,753-Speed 3502.60 samples/sec Loss 35.1039 LearningRate 0.1961 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:50,688-Speed 3489.65 samples/sec Loss 34.8547 LearningRate 0.1961 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:53,611-Speed 3504.02 samples/sec Loss 34.7911 LearningRate 0.1961 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:56,542-Speed 3494.96 samples/sec Loss 34.5784 LearningRate 0.1960 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:08:59,492-Speed 3472.33 samples/sec Loss 34.3999 LearningRate 0.1960 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:09:02,418-Speed 3501.61 samples/sec Loss 34.4062 LearningRate 0.1960 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:09:05,371-Speed 3468.08 samples/sec Loss 34.1352 LearningRate 0.1959 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:09:08,306-Speed 3489.78 samples/sec Loss 33.8435 LearningRate 0.1959 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:09:11,237-Speed 3494.38 samples/sec Loss 33.9978 LearningRate 0.1959 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:09:14,223-Speed 3430.71 samples/sec Loss 33.5305 LearningRate 0.1958 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:09:17,203-Speed 3436.54 samples/sec Loss 33.4122 LearningRate 0.1958 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:09:20,130-Speed 3499.87 samples/sec Loss 33.2946 LearningRate 0.1958 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:09:23,070-Speed 3484.11 samples/sec Loss 32.9362 LearningRate 0.1958 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:09:26,032-Speed 3457.61 samples/sec Loss 32.7478 LearningRate 0.1957 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:09:28,974-Speed 3482.42 samples/sec Loss 32.6850 LearningRate 0.1957 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:09:31,945-Speed 3447.16 samples/sec Loss 32.6137 LearningRate 0.1957 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:09:34,870-Speed 3502.00 samples/sec Loss 32.1936 LearningRate 0.1956 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:09:37,838-Speed 3451.92 samples/sec Loss 32.2442 LearningRate 0.1956 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:09:40,764-Speed 3499.86 samples/sec Loss 32.1195 LearningRate 0.1956 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 524288 Required: 10 hours Training: 2022-01-19 17:09:43,688-Speed 3502.91 samples/sec Loss 31.8167 LearningRate 0.1955 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:09:46,631-Speed 3480.60 samples/sec Loss 31.6698 LearningRate 0.1955 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:09:49,560-Speed 3496.43 samples/sec Loss 31.5920 LearningRate 0.1955 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:09:52,492-Speed 3495.09 samples/sec Loss 31.3047 LearningRate 0.1954 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:09:55,417-Speed 3501.10 samples/sec Loss 31.3651 LearningRate 0.1954 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:09:58,358-Speed 3482.06 samples/sec Loss 30.6181 LearningRate 0.1954 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:01,299-Speed 3482.52 samples/sec Loss 30.4937 LearningRate 0.1953 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:04,234-Speed 3490.46 samples/sec Loss 30.4039 LearningRate 0.1953 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:07,169-Speed 3490.38 samples/sec Loss 30.4461 LearningRate 0.1953 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:10,122-Speed 3467.87 samples/sec Loss 30.1298 LearningRate 0.1953 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:13,051-Speed 3498.35 samples/sec Loss 29.8494 LearningRate 0.1952 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:15,996-Speed 3476.78 samples/sec Loss 30.0017 LearningRate 0.1952 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:18,928-Speed 3494.61 samples/sec Loss 29.8911 LearningRate 0.1952 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:21,857-Speed 3497.53 samples/sec Loss 29.5684 LearningRate 0.1951 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:24,784-Speed 3499.35 samples/sec Loss 29.3901 LearningRate 0.1951 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:10:27,710-Speed 3500.42 samples/sec Loss 29.3721 LearningRate 0.1951 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:10:30,638-Speed 3498.18 samples/sec Loss 28.9707 LearningRate 0.1950 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:10:33,577-Speed 3484.86 samples/sec Loss 28.7904 LearningRate 0.1950 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:10:36,513-Speed 3489.48 samples/sec Loss 28.6094 LearningRate 0.1950 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:10:39,449-Speed 3487.85 samples/sec Loss 28.5633 LearningRate 0.1949 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:10:42,370-Speed 3507.04 samples/sec Loss 28.5717 LearningRate 0.1949 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:45,293-Speed 3503.81 samples/sec Loss 28.4833 LearningRate 0.1949 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:48,219-Speed 3501.48 samples/sec Loss 28.1818 LearningRate 0.1948 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:51,146-Speed 3499.90 samples/sec Loss 28.0325 LearningRate 0.1948 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:54,086-Speed 3483.42 samples/sec Loss 27.8339 LearningRate 0.1948 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:57,019-Speed 3492.27 samples/sec Loss 27.6626 LearningRate 0.1948 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:10:59,957-Speed 3485.92 samples/sec Loss 27.4481 LearningRate 0.1947 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:02,889-Speed 3494.42 samples/sec Loss 27.2539 LearningRate 0.1947 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:05,824-Speed 3489.69 samples/sec Loss 27.4094 LearningRate 0.1947 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:08,818-Speed 3420.22 samples/sec Loss 27.1274 LearningRate 0.1946 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:11,795-Speed 3441.32 samples/sec Loss 27.1167 LearningRate 0.1946 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:11:14,725-Speed 3496.20 samples/sec Loss 26.7496 LearningRate 0.1946 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:17,683-Speed 3463.13 samples/sec Loss 26.6677 LearningRate 0.1945 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:20,609-Speed 3500.39 samples/sec Loss 26.5002 LearningRate 0.1945 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:23,536-Speed 3499.10 samples/sec Loss 26.6674 LearningRate 0.1945 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:26,468-Speed 3493.74 samples/sec Loss 26.1699 LearningRate 0.1944 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:29,398-Speed 3495.95 samples/sec Loss 25.9790 LearningRate 0.1944 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:32,335-Speed 3487.73 samples/sec Loss 25.9414 LearningRate 0.1944 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:35,272-Speed 3487.62 samples/sec Loss 25.9367 LearningRate 0.1943 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:38,202-Speed 3495.61 samples/sec Loss 25.7031 LearningRate 0.1943 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:41,132-Speed 3496.17 samples/sec Loss 25.5749 LearningRate 0.1943 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:11:44,067-Speed 3490.10 samples/sec Loss 25.5321 LearningRate 0.1943 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:11:47,003-Speed 3488.62 samples/sec Loss 25.3835 LearningRate 0.1942 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:11:49,936-Speed 3492.20 samples/sec Loss 25.1910 LearningRate 0.1942 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:11:52,866-Speed 3497.80 samples/sec Loss 25.1751 LearningRate 0.1942 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:11:55,798-Speed 3493.28 samples/sec Loss 25.0854 LearningRate 0.1941 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:11:58,784-Speed 3429.77 samples/sec Loss 25.0101 LearningRate 0.1941 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:12:01,830-Speed 3362.53 samples/sec Loss 24.8138 LearningRate 0.1941 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:12:04,823-Speed 3422.81 samples/sec Loss 24.6637 LearningRate 0.1940 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 17:12:07,808-Speed 3432.70 samples/sec Loss 24.5190 LearningRate 0.1940 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:12:10,800-Speed 3423.56 samples/sec Loss 24.3637 LearningRate 0.1940 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:12:13,751-Speed 3470.59 samples/sec Loss 24.4438 LearningRate 0.1939 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:12:16,679-Speed 3498.21 samples/sec Loss 24.3318 LearningRate 0.1939 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:12:19,613-Speed 3491.09 samples/sec Loss 24.1874 LearningRate 0.1939 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:12:22,545-Speed 3493.13 samples/sec Loss 24.0375 LearningRate 0.1938 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:12:25,496-Speed 3470.90 samples/sec Loss 23.9123 LearningRate 0.1938 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:12:28,427-Speed 3494.63 samples/sec Loss 23.8935 LearningRate 0.1938 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:12:31,355-Speed 3497.66 samples/sec Loss 23.7075 LearningRate 0.1938 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:12:34,294-Speed 3485.03 samples/sec Loss 23.6559 LearningRate 0.1937 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 17:13:17,288-[lfw][2000]XNorm: 23.779532 Training: 2022-01-19 17:13:17,288-[lfw][2000]Accuracy-Flip: 0.97983+-0.00652 Training: 2022-01-19 17:13:17,289-[lfw][2000]Accuracy-Highest: 0.97983 Training: 2022-01-19 17:14:07,133-[cfp_fp][2000]XNorm: 20.717253 Training: 2022-01-19 17:14:07,134-[cfp_fp][2000]Accuracy-Flip: 0.81529+-0.01398 Training: 2022-01-19 17:14:07,134-[cfp_fp][2000]Accuracy-Highest: 0.81529 Training: 2022-01-19 17:14:49,977-[agedb_30][2000]XNorm: 22.966134 Training: 2022-01-19 17:14:49,978-[agedb_30][2000]Accuracy-Flip: 0.86967+-0.02236 Training: 2022-01-19 17:14:49,979-[agedb_30][2000]Accuracy-Highest: 0.86967 Training: 2022-01-19 17:14:52,899-Speed 73.88 samples/sec Loss 23.4274 LearningRate 0.1937 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:14:55,813-Speed 3514.79 samples/sec Loss 23.3924 LearningRate 0.1937 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:14:58,732-Speed 3508.96 samples/sec Loss 23.3288 LearningRate 0.1936 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:01,653-Speed 3506.73 samples/sec Loss 23.4352 LearningRate 0.1936 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:04,572-Speed 3509.44 samples/sec Loss 23.1000 LearningRate 0.1936 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:07,498-Speed 3500.62 samples/sec Loss 23.0390 LearningRate 0.1935 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:10,421-Speed 3503.85 samples/sec Loss 23.1599 LearningRate 0.1935 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:13,367-Speed 3476.33 samples/sec Loss 22.9570 LearningRate 0.1935 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:16,325-Speed 3462.63 samples/sec Loss 22.7203 LearningRate 0.1934 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:19,253-Speed 3499.18 samples/sec Loss 22.7331 LearningRate 0.1934 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:22,204-Speed 3470.51 samples/sec Loss 22.6126 LearningRate 0.1934 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-19 17:15:25,134-Speed 3495.97 samples/sec Loss 22.8072 LearningRate 0.1934 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:28,090-Speed 3464.72 samples/sec Loss 22.4047 LearningRate 0.1933 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:31,021-Speed 3495.53 samples/sec Loss 22.3894 LearningRate 0.1933 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:33,948-Speed 3499.14 samples/sec Loss 22.3064 LearningRate 0.1933 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-19 17:15:36,853-Speed 3526.73 samples/sec Loss 22.1769 LearningRate 0.1932 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-19 17:15:39,779-Speed 3499.84 samples/sec Loss 22.0707 LearningRate 0.1932 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:15:42,717-Speed 3486.39 samples/sec Loss 22.2243 LearningRate 0.1932 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:15:45,643-Speed 3501.51 samples/sec Loss 21.8383 LearningRate 0.1931 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:15:48,600-Speed 3463.43 samples/sec Loss 22.1487 LearningRate 0.1931 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:15:51,524-Speed 3503.40 samples/sec Loss 22.1224 LearningRate 0.1931 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:15:54,447-Speed 3503.60 samples/sec Loss 21.7823 LearningRate 0.1930 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:15:57,371-Speed 3503.63 samples/sec Loss 21.8248 LearningRate 0.1930 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:16:00,296-Speed 3501.72 samples/sec Loss 21.7249 LearningRate 0.1930 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:16:03,218-Speed 3504.50 samples/sec Loss 21.7831 LearningRate 0.1929 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:16:06,139-Speed 3507.00 samples/sec Loss 21.7068 LearningRate 0.1929 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:16:09,090-Speed 3471.53 samples/sec Loss 21.4980 LearningRate 0.1929 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:16:12,085-Speed 3420.39 samples/sec Loss 21.4912 LearningRate 0.1929 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:16:15,014-Speed 3496.76 samples/sec Loss 21.4888 LearningRate 0.1928 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:16:18,075-Speed 3346.47 samples/sec Loss 21.3780 LearningRate 0.1928 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:16:21,000-Speed 3501.75 samples/sec Loss 21.3569 LearningRate 0.1928 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:16:23,957-Speed 3463.64 samples/sec Loss 21.2982 LearningRate 0.1927 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:16:26,905-Speed 3475.42 samples/sec Loss 21.2359 LearningRate 0.1927 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:16:29,831-Speed 3500.44 samples/sec Loss 21.1317 LearningRate 0.1927 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:16:32,781-Speed 3472.38 samples/sec Loss 20.8550 LearningRate 0.1926 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:16:35,727-Speed 3476.25 samples/sec Loss 20.9812 LearningRate 0.1926 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:16:38,714-Speed 3429.64 samples/sec Loss 20.9082 LearningRate 0.1926 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:16:41,684-Speed 3449.60 samples/sec Loss 21.0147 LearningRate 0.1925 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:16:44,649-Speed 3453.83 samples/sec Loss 20.7462 LearningRate 0.1925 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:16:47,593-Speed 3479.34 samples/sec Loss 20.6946 LearningRate 0.1925 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:16:50,526-Speed 3493.01 samples/sec Loss 20.8176 LearningRate 0.1924 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:16:53,464-Speed 3486.29 samples/sec Loss 20.7316 LearningRate 0.1924 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:16:56,404-Speed 3496.15 samples/sec Loss 20.8164 LearningRate 0.1924 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:16:59,328-Speed 3503.35 samples/sec Loss 20.5315 LearningRate 0.1924 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:02,255-Speed 3498.49 samples/sec Loss 20.4083 LearningRate 0.1923 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:05,186-Speed 3496.02 samples/sec Loss 20.2927 LearningRate 0.1923 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:17:08,116-Speed 3495.24 samples/sec Loss 20.3794 LearningRate 0.1923 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:17:11,064-Speed 3474.79 samples/sec Loss 20.3896 LearningRate 0.1922 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:17:14,001-Speed 3487.83 samples/sec Loss 20.1426 LearningRate 0.1922 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:17:16,927-Speed 3500.49 samples/sec Loss 20.3146 LearningRate 0.1922 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:17:19,880-Speed 3469.08 samples/sec Loss 20.0151 LearningRate 0.1921 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:17:22,851-Speed 3448.14 samples/sec Loss 20.1282 LearningRate 0.1921 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:17:25,811-Speed 3460.59 samples/sec Loss 19.9929 LearningRate 0.1921 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:28,730-Speed 3508.09 samples/sec Loss 20.0817 LearningRate 0.1920 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:31,657-Speed 3500.31 samples/sec Loss 20.1346 LearningRate 0.1920 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:34,585-Speed 3498.25 samples/sec Loss 20.0553 LearningRate 0.1920 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:37,516-Speed 3494.61 samples/sec Loss 19.8753 LearningRate 0.1920 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:40,443-Speed 3499.97 samples/sec Loss 19.9086 LearningRate 0.1919 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:43,365-Speed 3505.28 samples/sec Loss 19.9009 LearningRate 0.1919 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:46,288-Speed 3505.59 samples/sec Loss 19.7875 LearningRate 0.1919 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:49,236-Speed 3473.26 samples/sec Loss 19.8220 LearningRate 0.1918 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:52,163-Speed 3500.41 samples/sec Loss 19.6707 LearningRate 0.1918 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:17:55,120-Speed 3463.98 samples/sec Loss 19.6911 LearningRate 0.1918 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:17:58,049-Speed 3496.76 samples/sec Loss 19.7453 LearningRate 0.1917 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:18:00,962-Speed 3516.72 samples/sec Loss 19.7236 LearningRate 0.1917 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:18:03,888-Speed 3499.92 samples/sec Loss 19.6524 LearningRate 0.1917 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:18:06,810-Speed 3506.20 samples/sec Loss 19.6120 LearningRate 0.1916 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:18:09,736-Speed 3500.54 samples/sec Loss 19.7615 LearningRate 0.1916 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:18:12,669-Speed 3492.80 samples/sec Loss 19.4808 LearningRate 0.1916 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:18:15,590-Speed 3505.44 samples/sec Loss 19.4369 LearningRate 0.1916 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:18:18,523-Speed 3493.07 samples/sec Loss 19.4862 LearningRate 0.1915 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:18:21,448-Speed 3502.01 samples/sec Loss 19.3072 LearningRate 0.1915 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:18:24,381-Speed 3492.11 samples/sec Loss 19.2123 LearningRate 0.1915 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:18:27,324-Speed 3480.89 samples/sec Loss 19.2079 LearningRate 0.1914 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:18:30,315-Speed 3423.51 samples/sec Loss 19.3863 LearningRate 0.1914 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:18:33,250-Speed 3491.04 samples/sec Loss 19.2000 LearningRate 0.1914 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:18:36,177-Speed 3498.67 samples/sec Loss 19.0360 LearningRate 0.1913 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:18:39,108-Speed 3494.42 samples/sec Loss 19.2886 LearningRate 0.1913 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:18:42,035-Speed 3499.34 samples/sec Loss 19.3809 LearningRate 0.1913 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:18:44,958-Speed 3504.22 samples/sec Loss 18.9651 LearningRate 0.1912 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:18:47,900-Speed 3482.08 samples/sec Loss 18.9973 LearningRate 0.1912 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:18:50,832-Speed 3493.54 samples/sec Loss 19.1171 LearningRate 0.1912 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:18:53,761-Speed 3496.51 samples/sec Loss 19.0969 LearningRate 0.1911 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:18:56,697-Speed 3490.16 samples/sec Loss 18.7288 LearningRate 0.1911 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:18:59,636-Speed 3484.26 samples/sec Loss 18.9458 LearningRate 0.1911 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:19:02,550-Speed 3515.27 samples/sec Loss 18.9590 LearningRate 0.1911 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:05,489-Speed 3485.10 samples/sec Loss 18.7565 LearningRate 0.1910 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:08,414-Speed 3502.21 samples/sec Loss 18.8788 LearningRate 0.1910 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:11,361-Speed 3475.56 samples/sec Loss 18.7995 LearningRate 0.1910 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:14,320-Speed 3461.68 samples/sec Loss 18.9387 LearningRate 0.1909 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:17,279-Speed 3462.11 samples/sec Loss 18.9293 LearningRate 0.1909 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:20,209-Speed 3495.03 samples/sec Loss 18.7119 LearningRate 0.1909 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:23,146-Speed 3487.89 samples/sec Loss 18.5279 LearningRate 0.1908 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:26,072-Speed 3500.59 samples/sec Loss 18.6303 LearningRate 0.1908 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:29,024-Speed 3469.46 samples/sec Loss 18.6712 LearningRate 0.1908 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:31,945-Speed 3507.74 samples/sec Loss 18.4815 LearningRate 0.1907 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:34,901-Speed 3464.36 samples/sec Loss 18.5200 LearningRate 0.1907 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:37,832-Speed 3495.15 samples/sec Loss 18.5200 LearningRate 0.1907 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:40,808-Speed 3441.61 samples/sec Loss 18.2874 LearningRate 0.1907 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:43,860-Speed 3356.49 samples/sec Loss 18.6916 LearningRate 0.1906 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:46,797-Speed 3486.94 samples/sec Loss 18.3434 LearningRate 0.1906 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:49,807-Speed 3402.80 samples/sec Loss 18.3837 LearningRate 0.1906 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:52,765-Speed 3462.76 samples/sec Loss 18.3108 LearningRate 0.1905 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:55,693-Speed 3498.14 samples/sec Loss 18.4173 LearningRate 0.1905 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:19:58,604-Speed 3519.11 samples/sec Loss 18.3692 LearningRate 0.1905 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:20:01,528-Speed 3503.25 samples/sec Loss 18.5517 LearningRate 0.1904 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:20:04,462-Speed 3490.99 samples/sec Loss 18.4367 LearningRate 0.1904 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:20:07,390-Speed 3498.15 samples/sec Loss 18.4017 LearningRate 0.1904 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:20:10,319-Speed 3497.29 samples/sec Loss 18.1325 LearningRate 0.1903 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:20:13,248-Speed 3497.74 samples/sec Loss 18.2456 LearningRate 0.1903 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:20:16,175-Speed 3499.10 samples/sec Loss 18.2035 LearningRate 0.1903 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:20:19,103-Speed 3497.19 samples/sec Loss 18.1690 LearningRate 0.1903 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:20:22,085-Speed 3435.44 samples/sec Loss 18.3611 LearningRate 0.1902 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:20:25,034-Speed 3473.05 samples/sec Loss 18.0983 LearningRate 0.1902 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:20:27,974-Speed 3484.56 samples/sec Loss 18.2121 LearningRate 0.1902 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:20:30,911-Speed 3486.70 samples/sec Loss 18.1239 LearningRate 0.1901 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:20:33,846-Speed 3489.44 samples/sec Loss 17.9552 LearningRate 0.1901 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:20:36,779-Speed 3493.42 samples/sec Loss 18.0292 LearningRate 0.1901 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:20:39,727-Speed 3474.40 samples/sec Loss 18.1017 LearningRate 0.1900 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:20:42,656-Speed 3496.92 samples/sec Loss 17.9741 LearningRate 0.1900 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:20:45,581-Speed 3502.30 samples/sec Loss 17.6834 LearningRate 0.1900 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:20:48,525-Speed 3478.21 samples/sec Loss 17.9730 LearningRate 0.1899 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:20:51,455-Speed 3495.75 samples/sec Loss 17.8465 LearningRate 0.1899 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:20:54,447-Speed 3423.08 samples/sec Loss 17.5787 LearningRate 0.1899 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:20:57,364-Speed 3511.72 samples/sec Loss 17.6727 LearningRate 0.1899 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:00,294-Speed 3496.51 samples/sec Loss 17.7224 LearningRate 0.1898 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:03,233-Speed 3485.29 samples/sec Loss 17.9060 LearningRate 0.1898 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:06,178-Speed 3478.36 samples/sec Loss 17.6304 LearningRate 0.1898 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:09,152-Speed 3443.62 samples/sec Loss 17.8220 LearningRate 0.1897 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:12,100-Speed 3475.10 samples/sec Loss 17.9008 LearningRate 0.1897 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:15,023-Speed 3503.10 samples/sec Loss 17.8633 LearningRate 0.1897 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:17,958-Speed 3490.57 samples/sec Loss 17.6245 LearningRate 0.1896 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:20,938-Speed 3437.50 samples/sec Loss 17.5302 LearningRate 0.1896 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:23,926-Speed 3426.95 samples/sec Loss 17.4218 LearningRate 0.1896 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:26,944-Speed 3394.58 samples/sec Loss 17.6462 LearningRate 0.1895 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:21:29,948-Speed 3409.48 samples/sec Loss 17.4998 LearningRate 0.1895 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:32,930-Speed 3435.28 samples/sec Loss 17.5573 LearningRate 0.1895 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:35,867-Speed 3487.50 samples/sec Loss 17.5735 LearningRate 0.1895 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:38,833-Speed 3453.55 samples/sec Loss 17.6109 LearningRate 0.1894 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:41,814-Speed 3436.23 samples/sec Loss 17.5160 LearningRate 0.1894 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:44,751-Speed 3487.14 samples/sec Loss 17.6265 LearningRate 0.1894 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:47,728-Speed 3440.70 samples/sec Loss 17.3578 LearningRate 0.1893 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:50,657-Speed 3496.96 samples/sec Loss 17.2829 LearningRate 0.1893 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:53,592-Speed 3489.83 samples/sec Loss 17.3799 LearningRate 0.1893 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:56,531-Speed 3484.90 samples/sec Loss 17.3693 LearningRate 0.1892 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:21:59,458-Speed 3498.99 samples/sec Loss 17.4038 LearningRate 0.1892 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:22:02,414-Speed 3466.37 samples/sec Loss 17.5051 LearningRate 0.1892 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:22:05,405-Speed 3424.28 samples/sec Loss 17.3418 LearningRate 0.1891 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:22:08,389-Speed 3432.63 samples/sec Loss 17.5693 LearningRate 0.1891 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:22:11,425-Speed 3373.75 samples/sec Loss 17.2029 LearningRate 0.1891 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:22:14,342-Speed 3511.49 samples/sec Loss 17.3087 LearningRate 0.1891 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:17,279-Speed 3487.31 samples/sec Loss 17.1384 LearningRate 0.1890 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:20,208-Speed 3496.55 samples/sec Loss 17.0819 LearningRate 0.1890 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:23,143-Speed 3489.99 samples/sec Loss 17.0674 LearningRate 0.1890 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:26,076-Speed 3493.12 samples/sec Loss 17.1460 LearningRate 0.1889 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:29,008-Speed 3493.42 samples/sec Loss 17.1462 LearningRate 0.1889 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:31,951-Speed 3480.89 samples/sec Loss 17.0429 LearningRate 0.1889 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:34,879-Speed 3498.16 samples/sec Loss 17.3243 LearningRate 0.1888 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:37,809-Speed 3495.60 samples/sec Loss 17.1374 LearningRate 0.1888 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:40,739-Speed 3496.00 samples/sec Loss 17.1694 LearningRate 0.1888 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:43,671-Speed 3493.21 samples/sec Loss 17.3344 LearningRate 0.1887 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:22:46,602-Speed 3493.78 samples/sec Loss 17.0758 LearningRate 0.1887 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:49,537-Speed 3490.69 samples/sec Loss 17.1163 LearningRate 0.1887 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:52,472-Speed 3489.30 samples/sec Loss 17.1839 LearningRate 0.1887 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:55,461-Speed 3427.31 samples/sec Loss 17.1371 LearningRate 0.1886 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:22:58,402-Speed 3483.04 samples/sec Loss 17.0049 LearningRate 0.1886 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:23:01,363-Speed 3458.78 samples/sec Loss 16.8533 LearningRate 0.1886 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:23:04,297-Speed 3490.99 samples/sec Loss 17.1144 LearningRate 0.1885 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:23:07,228-Speed 3495.39 samples/sec Loss 16.9424 LearningRate 0.1885 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:23:10,157-Speed 3497.03 samples/sec Loss 16.7725 LearningRate 0.1885 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:23:13,089-Speed 3493.07 samples/sec Loss 17.0332 LearningRate 0.1884 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:23:16,026-Speed 3487.18 samples/sec Loss 16.8270 LearningRate 0.1884 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:23:18,967-Speed 3482.37 samples/sec Loss 17.0525 LearningRate 0.1884 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:23:21,920-Speed 3470.54 samples/sec Loss 17.0447 LearningRate 0.1883 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:23:24,864-Speed 3479.14 samples/sec Loss 16.9724 LearningRate 0.1883 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:23:27,793-Speed 3497.16 samples/sec Loss 16.9692 LearningRate 0.1883 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:23:30,816-Speed 3387.76 samples/sec Loss 16.8043 LearningRate 0.1883 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:23:33,751-Speed 3490.12 samples/sec Loss 16.8742 LearningRate 0.1882 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:23:36,740-Speed 3426.29 samples/sec Loss 16.7270 LearningRate 0.1882 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:23:39,681-Speed 3482.94 samples/sec Loss 16.4727 LearningRate 0.1882 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:23:42,600-Speed 3509.55 samples/sec Loss 16.8186 LearningRate 0.1881 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:23:45,527-Speed 3499.19 samples/sec Loss 16.7100 LearningRate 0.1881 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:23:48,464-Speed 3487.58 samples/sec Loss 16.7565 LearningRate 0.1881 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:23:51,395-Speed 3493.95 samples/sec Loss 16.7592 LearningRate 0.1880 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:23:54,331-Speed 3488.27 samples/sec Loss 16.7702 LearningRate 0.1880 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:23:57,276-Speed 3478.11 samples/sec Loss 16.9627 LearningRate 0.1880 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:24:00,288-Speed 3400.79 samples/sec Loss 16.7108 LearningRate 0.1879 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:24:03,356-Speed 3339.02 samples/sec Loss 16.8404 LearningRate 0.1879 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:24:06,299-Speed 3479.88 samples/sec Loss 16.6288 LearningRate 0.1879 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:24:09,321-Speed 3390.13 samples/sec Loss 16.8636 LearningRate 0.1879 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:24:12,298-Speed 3440.17 samples/sec Loss 16.7663 LearningRate 0.1878 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:24:15,229-Speed 3494.61 samples/sec Loss 16.7920 LearningRate 0.1878 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:24:18,200-Speed 3447.94 samples/sec Loss 16.6335 LearningRate 0.1878 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:24:21,130-Speed 3495.82 samples/sec Loss 16.7075 LearningRate 0.1877 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:24:24,086-Speed 3465.12 samples/sec Loss 16.6730 LearningRate 0.1877 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:24:27,029-Speed 3480.58 samples/sec Loss 16.5707 LearningRate 0.1877 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:24:30,001-Speed 3446.01 samples/sec Loss 16.7187 LearningRate 0.1876 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:24:32,945-Speed 3479.48 samples/sec Loss 16.5515 LearningRate 0.1876 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:24:35,875-Speed 3495.12 samples/sec Loss 16.7594 LearningRate 0.1876 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:24:38,809-Speed 3491.36 samples/sec Loss 16.4490 LearningRate 0.1875 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:25:21,588-[lfw][4000]XNorm: 22.247099 Training: 2022-01-19 17:25:21,588-[lfw][4000]Accuracy-Flip: 0.99133+-0.00407 Training: 2022-01-19 17:25:21,589-[lfw][4000]Accuracy-Highest: 0.99133 Training: 2022-01-19 17:26:11,180-[cfp_fp][4000]XNorm: 19.285814 Training: 2022-01-19 17:26:11,181-[cfp_fp][4000]Accuracy-Flip: 0.88886+-0.01673 Training: 2022-01-19 17:26:11,182-[cfp_fp][4000]Accuracy-Highest: 0.88886 Training: 2022-01-19 17:26:54,072-[agedb_30][4000]XNorm: 21.806404 Training: 2022-01-19 17:26:54,073-[agedb_30][4000]Accuracy-Flip: 0.93333+-0.01485 Training: 2022-01-19 17:26:54,073-[agedb_30][4000]Accuracy-Highest: 0.93333 Training: 2022-01-19 17:26:56,990-Speed 74.11 samples/sec Loss 16.5922 LearningRate 0.1875 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:26:59,901-Speed 3518.91 samples/sec Loss 16.5072 LearningRate 0.1875 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:27:02,812-Speed 3518.10 samples/sec Loss 16.7239 LearningRate 0.1875 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:27:05,728-Speed 3513.73 samples/sec Loss 16.4044 LearningRate 0.1874 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:27:08,662-Speed 3490.58 samples/sec Loss 16.4105 LearningRate 0.1874 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:27:11,614-Speed 3469.77 samples/sec Loss 16.6501 LearningRate 0.1874 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:27:14,536-Speed 3504.17 samples/sec Loss 16.4646 LearningRate 0.1873 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:27:17,469-Speed 3493.04 samples/sec Loss 16.3820 LearningRate 0.1873 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:27:20,378-Speed 3521.32 samples/sec Loss 16.6702 LearningRate 0.1873 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:27:23,302-Speed 3502.86 samples/sec Loss 16.3040 LearningRate 0.1872 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:27:26,236-Speed 3492.11 samples/sec Loss 16.2995 LearningRate 0.1872 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:27:29,163-Speed 3499.49 samples/sec Loss 16.4459 LearningRate 0.1872 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:27:32,103-Speed 3483.25 samples/sec Loss 16.5403 LearningRate 0.1871 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:27:35,034-Speed 3494.85 samples/sec Loss 16.4765 LearningRate 0.1871 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:27:37,972-Speed 3486.15 samples/sec Loss 16.5093 LearningRate 0.1871 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:27:40,917-Speed 3479.00 samples/sec Loss 16.3068 LearningRate 0.1871 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:27:43,854-Speed 3487.47 samples/sec Loss 16.5199 LearningRate 0.1870 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:27:46,785-Speed 3494.09 samples/sec Loss 16.4140 LearningRate 0.1870 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:27:49,713-Speed 3497.96 samples/sec Loss 16.2285 LearningRate 0.1870 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:27:52,667-Speed 3467.42 samples/sec Loss 16.4028 LearningRate 0.1869 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:27:55,643-Speed 3442.06 samples/sec Loss 16.5013 LearningRate 0.1869 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:27:58,577-Speed 3491.04 samples/sec Loss 16.3834 LearningRate 0.1869 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:01,506-Speed 3497.11 samples/sec Loss 16.2859 LearningRate 0.1868 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:04,457-Speed 3470.56 samples/sec Loss 16.1292 LearningRate 0.1868 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:07,385-Speed 3498.42 samples/sec Loss 16.3176 LearningRate 0.1868 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:10,328-Speed 3479.57 samples/sec Loss 16.3262 LearningRate 0.1868 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:13,273-Speed 3478.61 samples/sec Loss 16.2541 LearningRate 0.1867 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:16,236-Speed 3456.60 samples/sec Loss 16.2180 LearningRate 0.1867 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:19,176-Speed 3484.71 samples/sec Loss 16.2929 LearningRate 0.1867 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:22,115-Speed 3484.74 samples/sec Loss 16.2668 LearningRate 0.1866 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:25,043-Speed 3498.23 samples/sec Loss 16.2428 LearningRate 0.1866 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:27,979-Speed 3488.61 samples/sec Loss 16.3327 LearningRate 0.1866 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:30,918-Speed 3485.17 samples/sec Loss 16.2465 LearningRate 0.1865 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:33,847-Speed 3496.53 samples/sec Loss 16.1160 LearningRate 0.1865 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:36,784-Speed 3488.40 samples/sec Loss 16.1113 LearningRate 0.1865 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:39,736-Speed 3469.00 samples/sec Loss 15.9901 LearningRate 0.1864 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:42,665-Speed 3496.85 samples/sec Loss 16.1588 LearningRate 0.1864 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:28:45,593-Speed 3498.60 samples/sec Loss 16.2542 LearningRate 0.1864 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:28:48,527-Speed 3491.68 samples/sec Loss 16.3159 LearningRate 0.1864 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:28:51,457-Speed 3495.06 samples/sec Loss 15.9246 LearningRate 0.1863 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:28:54,386-Speed 3496.77 samples/sec Loss 16.4021 LearningRate 0.1863 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:28:57,317-Speed 3495.57 samples/sec Loss 16.1326 LearningRate 0.1863 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:00,248-Speed 3495.06 samples/sec Loss 16.0619 LearningRate 0.1862 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:03,173-Speed 3500.54 samples/sec Loss 16.1022 LearningRate 0.1862 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:06,139-Speed 3453.82 samples/sec Loss 16.1973 LearningRate 0.1862 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:09,105-Speed 3453.39 samples/sec Loss 15.9982 LearningRate 0.1861 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:12,052-Speed 3475.08 samples/sec Loss 16.1746 LearningRate 0.1861 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:14,979-Speed 3499.97 samples/sec Loss 15.9457 LearningRate 0.1861 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:29:17,902-Speed 3503.57 samples/sec Loss 16.1266 LearningRate 0.1860 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:29:20,831-Speed 3497.73 samples/sec Loss 15.9660 LearningRate 0.1860 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:23,763-Speed 3493.11 samples/sec Loss 16.0338 LearningRate 0.1860 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:26,698-Speed 3490.73 samples/sec Loss 15.9274 LearningRate 0.1860 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:29,662-Speed 3455.59 samples/sec Loss 16.0634 LearningRate 0.1859 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:32,596-Speed 3490.76 samples/sec Loss 16.0330 LearningRate 0.1859 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:35,525-Speed 3495.88 samples/sec Loss 15.9112 LearningRate 0.1859 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:38,452-Speed 3499.51 samples/sec Loss 15.9059 LearningRate 0.1858 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:41,405-Speed 3470.06 samples/sec Loss 16.0968 LearningRate 0.1858 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:44,337-Speed 3493.11 samples/sec Loss 15.8988 LearningRate 0.1858 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:47,277-Speed 3484.84 samples/sec Loss 15.9016 LearningRate 0.1857 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:29:50,207-Speed 3496.96 samples/sec Loss 15.7819 LearningRate 0.1857 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:29:53,153-Speed 3476.30 samples/sec Loss 15.8435 LearningRate 0.1857 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:29:56,084-Speed 3495.77 samples/sec Loss 15.9002 LearningRate 0.1857 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:29:59,037-Speed 3469.10 samples/sec Loss 15.8412 LearningRate 0.1856 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:02,049-Speed 3400.16 samples/sec Loss 15.8745 LearningRate 0.1856 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:04,991-Speed 3482.09 samples/sec Loss 15.9535 LearningRate 0.1856 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:07,925-Speed 3490.54 samples/sec Loss 15.7979 LearningRate 0.1855 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:10,851-Speed 3500.72 samples/sec Loss 15.9561 LearningRate 0.1855 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:13,785-Speed 3491.08 samples/sec Loss 15.8933 LearningRate 0.1855 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:16,713-Speed 3498.91 samples/sec Loss 15.9427 LearningRate 0.1854 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:19,639-Speed 3500.49 samples/sec Loss 15.8761 LearningRate 0.1854 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:22,567-Speed 3497.68 samples/sec Loss 15.9123 LearningRate 0.1854 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:25,493-Speed 3500.72 samples/sec Loss 15.9938 LearningRate 0.1853 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:28,438-Speed 3478.55 samples/sec Loss 15.7625 LearningRate 0.1853 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:30:31,366-Speed 3497.68 samples/sec Loss 15.9585 LearningRate 0.1853 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:30:34,432-Speed 3340.73 samples/sec Loss 15.8006 LearningRate 0.1853 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:30:37,351-Speed 3508.90 samples/sec Loss 15.9184 LearningRate 0.1852 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:40,305-Speed 3467.12 samples/sec Loss 15.9690 LearningRate 0.1852 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:43,299-Speed 3479.20 samples/sec Loss 15.7622 LearningRate 0.1852 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:46,303-Speed 3409.55 samples/sec Loss 16.0575 LearningRate 0.1851 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:49,289-Speed 3491.44 samples/sec Loss 15.9255 LearningRate 0.1851 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:52,219-Speed 3495.53 samples/sec Loss 15.7319 LearningRate 0.1851 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:55,150-Speed 3493.64 samples/sec Loss 15.6922 LearningRate 0.1850 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:30:58,097-Speed 3475.88 samples/sec Loss 15.7505 LearningRate 0.1850 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:01,032-Speed 3490.12 samples/sec Loss 15.7421 LearningRate 0.1850 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:03,964-Speed 3494.16 samples/sec Loss 15.6733 LearningRate 0.1850 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:06,891-Speed 3499.37 samples/sec Loss 15.6493 LearningRate 0.1849 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:09,838-Speed 3475.27 samples/sec Loss 15.5024 LearningRate 0.1849 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:12,763-Speed 3502.02 samples/sec Loss 15.5794 LearningRate 0.1849 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:15,688-Speed 3500.94 samples/sec Loss 15.7889 LearningRate 0.1848 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:18,624-Speed 3489.35 samples/sec Loss 15.6595 LearningRate 0.1848 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:21,557-Speed 3492.36 samples/sec Loss 15.8520 LearningRate 0.1848 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:24,496-Speed 3485.04 samples/sec Loss 15.4562 LearningRate 0.1847 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:27,427-Speed 3494.87 samples/sec Loss 15.7067 LearningRate 0.1847 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:30,357-Speed 3495.49 samples/sec Loss 15.4859 LearningRate 0.1847 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:33,298-Speed 3483.34 samples/sec Loss 15.5298 LearningRate 0.1846 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:31:36,222-Speed 3502.16 samples/sec Loss 15.7191 LearningRate 0.1846 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:31:39,147-Speed 3501.67 samples/sec Loss 15.7698 LearningRate 0.1846 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:31:42,078-Speed 3495.23 samples/sec Loss 15.4959 LearningRate 0.1846 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:31:45,058-Speed 3436.98 samples/sec Loss 15.6496 LearningRate 0.1845 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:31:48,038-Speed 3437.17 samples/sec Loss 15.5096 LearningRate 0.1845 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:31:50,987-Speed 3473.19 samples/sec Loss 15.7866 LearningRate 0.1845 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:31:53,928-Speed 3482.52 samples/sec Loss 15.5826 LearningRate 0.1844 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:31:56,868-Speed 3484.44 samples/sec Loss 15.7907 LearningRate 0.1844 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:31:59,796-Speed 3497.58 samples/sec Loss 15.5379 LearningRate 0.1844 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:32:02,831-Speed 3374.97 samples/sec Loss 15.5182 LearningRate 0.1843 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:32:17,197-Speed 712.85 samples/sec Loss 15.4254 LearningRate 0.1843 Epoch: 1 Global Step: 5060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:32:20,176-Speed 3439.05 samples/sec Loss 14.6733 LearningRate 0.1843 Epoch: 1 Global Step: 5070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:32:23,127-Speed 3470.39 samples/sec Loss 14.5898 LearningRate 0.1843 Epoch: 1 Global Step: 5080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:32:26,164-Speed 3372.62 samples/sec Loss 14.7043 LearningRate 0.1842 Epoch: 1 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:32:29,144-Speed 3438.16 samples/sec Loss 14.9079 LearningRate 0.1842 Epoch: 1 Global Step: 5100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:32:32,086-Speed 3481.79 samples/sec Loss 14.5782 LearningRate 0.1842 Epoch: 1 Global Step: 5110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:32:35,067-Speed 3436.66 samples/sec Loss 14.7794 LearningRate 0.1841 Epoch: 1 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:32:38,055-Speed 3428.72 samples/sec Loss 14.9140 LearningRate 0.1841 Epoch: 1 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:32:41,036-Speed 3435.69 samples/sec Loss 14.9294 LearningRate 0.1841 Epoch: 1 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:32:43,985-Speed 3474.04 samples/sec Loss 14.5478 LearningRate 0.1840 Epoch: 1 Global Step: 5150 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:32:46,934-Speed 3473.38 samples/sec Loss 14.9824 LearningRate 0.1840 Epoch: 1 Global Step: 5160 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:32:49,923-Speed 3427.32 samples/sec Loss 14.9347 LearningRate 0.1840 Epoch: 1 Global Step: 5170 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:32:52,887-Speed 3456.08 samples/sec Loss 14.9949 LearningRate 0.1839 Epoch: 1 Global Step: 5180 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:32:55,820-Speed 3492.30 samples/sec Loss 14.7715 LearningRate 0.1839 Epoch: 1 Global Step: 5190 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:32:58,764-Speed 3480.17 samples/sec Loss 15.0416 LearningRate 0.1839 Epoch: 1 Global Step: 5200 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:33:01,687-Speed 3504.33 samples/sec Loss 15.0191 LearningRate 0.1839 Epoch: 1 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:33:04,614-Speed 3499.13 samples/sec Loss 14.9724 LearningRate 0.1838 Epoch: 1 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:33:07,561-Speed 3476.08 samples/sec Loss 14.9594 LearningRate 0.1838 Epoch: 1 Global Step: 5230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:33:10,486-Speed 3500.99 samples/sec Loss 14.9081 LearningRate 0.1838 Epoch: 1 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:33:13,419-Speed 3492.83 samples/sec Loss 14.9538 LearningRate 0.1837 Epoch: 1 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:33:16,428-Speed 3403.61 samples/sec Loss 15.1845 LearningRate 0.1837 Epoch: 1 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:33:19,357-Speed 3497.54 samples/sec Loss 14.9632 LearningRate 0.1837 Epoch: 1 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:33:22,286-Speed 3497.98 samples/sec Loss 15.0005 LearningRate 0.1836 Epoch: 1 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:33:25,223-Speed 3487.12 samples/sec Loss 15.1501 LearningRate 0.1836 Epoch: 1 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:33:28,154-Speed 3494.49 samples/sec Loss 15.0141 LearningRate 0.1836 Epoch: 1 Global Step: 5300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:33:31,104-Speed 3472.34 samples/sec Loss 14.9778 LearningRate 0.1836 Epoch: 1 Global Step: 5310 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:33:34,059-Speed 3465.88 samples/sec Loss 15.2427 LearningRate 0.1835 Epoch: 1 Global Step: 5320 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:33:36,999-Speed 3484.68 samples/sec Loss 15.1143 LearningRate 0.1835 Epoch: 1 Global Step: 5330 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:33:39,935-Speed 3488.63 samples/sec Loss 15.0552 LearningRate 0.1835 Epoch: 1 Global Step: 5340 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:33:42,850-Speed 3514.00 samples/sec Loss 15.0099 LearningRate 0.1834 Epoch: 1 Global Step: 5350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:33:45,782-Speed 3492.17 samples/sec Loss 15.0755 LearningRate 0.1834 Epoch: 1 Global Step: 5360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:33:48,731-Speed 3474.35 samples/sec Loss 14.9358 LearningRate 0.1834 Epoch: 1 Global Step: 5370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:33:51,694-Speed 3458.43 samples/sec Loss 15.0549 LearningRate 0.1833 Epoch: 1 Global Step: 5380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:33:54,637-Speed 3479.72 samples/sec Loss 15.2286 LearningRate 0.1833 Epoch: 1 Global Step: 5390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:33:57,616-Speed 3438.47 samples/sec Loss 15.0803 LearningRate 0.1833 Epoch: 1 Global Step: 5400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:34:00,608-Speed 3422.67 samples/sec Loss 15.2122 LearningRate 0.1833 Epoch: 1 Global Step: 5410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:34:03,551-Speed 3480.78 samples/sec Loss 15.3214 LearningRate 0.1832 Epoch: 1 Global Step: 5420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:34:06,501-Speed 3471.97 samples/sec Loss 15.2155 LearningRate 0.1832 Epoch: 1 Global Step: 5430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:34:09,444-Speed 3480.61 samples/sec Loss 14.9870 LearningRate 0.1832 Epoch: 1 Global Step: 5440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:34:12,392-Speed 3474.91 samples/sec Loss 15.0348 LearningRate 0.1831 Epoch: 1 Global Step: 5450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:15,338-Speed 3476.52 samples/sec Loss 15.3086 LearningRate 0.1831 Epoch: 1 Global Step: 5460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:18,276-Speed 3485.65 samples/sec Loss 14.9643 LearningRate 0.1831 Epoch: 1 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:21,220-Speed 3479.69 samples/sec Loss 15.2277 LearningRate 0.1830 Epoch: 1 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:24,178-Speed 3462.81 samples/sec Loss 15.0092 LearningRate 0.1830 Epoch: 1 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:27,128-Speed 3472.71 samples/sec Loss 15.1087 LearningRate 0.1830 Epoch: 1 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:30,072-Speed 3478.57 samples/sec Loss 15.2373 LearningRate 0.1829 Epoch: 1 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:33,011-Speed 3485.36 samples/sec Loss 15.2685 LearningRate 0.1829 Epoch: 1 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:35,959-Speed 3473.90 samples/sec Loss 14.8415 LearningRate 0.1829 Epoch: 1 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:38,895-Speed 3489.85 samples/sec Loss 15.1221 LearningRate 0.1829 Epoch: 1 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:41,866-Speed 3447.34 samples/sec Loss 14.9756 LearningRate 0.1828 Epoch: 1 Global Step: 5550 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:34:44,814-Speed 3474.83 samples/sec Loss 15.1417 LearningRate 0.1828 Epoch: 1 Global Step: 5560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:34:47,781-Speed 3451.78 samples/sec Loss 15.0392 LearningRate 0.1828 Epoch: 1 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:50,758-Speed 3440.33 samples/sec Loss 15.1190 LearningRate 0.1827 Epoch: 1 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:53,723-Speed 3454.13 samples/sec Loss 15.0978 LearningRate 0.1827 Epoch: 1 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:56,676-Speed 3468.84 samples/sec Loss 14.8481 LearningRate 0.1827 Epoch: 1 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:34:59,612-Speed 3489.47 samples/sec Loss 15.2601 LearningRate 0.1826 Epoch: 1 Global Step: 5610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:02,550-Speed 3486.31 samples/sec Loss 15.1547 LearningRate 0.1826 Epoch: 1 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:05,500-Speed 3471.68 samples/sec Loss 15.0202 LearningRate 0.1826 Epoch: 1 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:08,436-Speed 3488.71 samples/sec Loss 15.1518 LearningRate 0.1826 Epoch: 1 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:11,368-Speed 3493.01 samples/sec Loss 15.1046 LearningRate 0.1825 Epoch: 1 Global Step: 5650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:14,309-Speed 3483.19 samples/sec Loss 15.0049 LearningRate 0.1825 Epoch: 1 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:17,265-Speed 3464.77 samples/sec Loss 15.2283 LearningRate 0.1825 Epoch: 1 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:20,227-Speed 3458.06 samples/sec Loss 15.0778 LearningRate 0.1824 Epoch: 1 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:23,162-Speed 3489.45 samples/sec Loss 14.9939 LearningRate 0.1824 Epoch: 1 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:26,108-Speed 3476.92 samples/sec Loss 15.0235 LearningRate 0.1824 Epoch: 1 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:29,044-Speed 3489.25 samples/sec Loss 15.1008 LearningRate 0.1823 Epoch: 1 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:31,975-Speed 3494.37 samples/sec Loss 15.2466 LearningRate 0.1823 Epoch: 1 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:34,941-Speed 3453.41 samples/sec Loss 15.1190 LearningRate 0.1823 Epoch: 1 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:37,940-Speed 3415.77 samples/sec Loss 15.2186 LearningRate 0.1823 Epoch: 1 Global Step: 5740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:40,968-Speed 3382.51 samples/sec Loss 14.8990 LearningRate 0.1822 Epoch: 1 Global Step: 5750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:44,001-Speed 3376.42 samples/sec Loss 15.0535 LearningRate 0.1822 Epoch: 1 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:35:46,960-Speed 3461.26 samples/sec Loss 15.0920 LearningRate 0.1822 Epoch: 1 Global Step: 5770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:35:49,924-Speed 3456.13 samples/sec Loss 14.9200 LearningRate 0.1821 Epoch: 1 Global Step: 5780 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:35:52,860-Speed 3489.28 samples/sec Loss 14.9810 LearningRate 0.1821 Epoch: 1 Global Step: 5790 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:35:55,879-Speed 3392.71 samples/sec Loss 15.0299 LearningRate 0.1821 Epoch: 1 Global Step: 5800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:35:58,819-Speed 3484.32 samples/sec Loss 14.7725 LearningRate 0.1820 Epoch: 1 Global Step: 5810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:36:01,770-Speed 3470.52 samples/sec Loss 14.7894 LearningRate 0.1820 Epoch: 1 Global Step: 5820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:36:04,689-Speed 3509.44 samples/sec Loss 14.8737 LearningRate 0.1820 Epoch: 1 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:36:07,630-Speed 3482.39 samples/sec Loss 14.9525 LearningRate 0.1820 Epoch: 1 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:36:10,577-Speed 3475.64 samples/sec Loss 14.8761 LearningRate 0.1819 Epoch: 1 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:36:13,537-Speed 3460.17 samples/sec Loss 15.0104 LearningRate 0.1819 Epoch: 1 Global Step: 5860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:36:16,488-Speed 3471.40 samples/sec Loss 15.0219 LearningRate 0.1819 Epoch: 1 Global Step: 5870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:36:19,433-Speed 3478.17 samples/sec Loss 14.9576 LearningRate 0.1818 Epoch: 1 Global Step: 5880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:36:22,376-Speed 3480.06 samples/sec Loss 15.0283 LearningRate 0.1818 Epoch: 1 Global Step: 5890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:36:25,317-Speed 3483.38 samples/sec Loss 15.0742 LearningRate 0.1818 Epoch: 1 Global Step: 5900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:36:28,263-Speed 3476.62 samples/sec Loss 14.9689 LearningRate 0.1817 Epoch: 1 Global Step: 5910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:36:31,196-Speed 3492.23 samples/sec Loss 14.9539 LearningRate 0.1817 Epoch: 1 Global Step: 5920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:36:34,127-Speed 3494.67 samples/sec Loss 14.9823 LearningRate 0.1817 Epoch: 1 Global Step: 5930 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:36:37,057-Speed 3496.21 samples/sec Loss 15.0419 LearningRate 0.1817 Epoch: 1 Global Step: 5940 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:36:39,989-Speed 3492.48 samples/sec Loss 14.9361 LearningRate 0.1816 Epoch: 1 Global Step: 5950 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:36:42,927-Speed 3487.57 samples/sec Loss 15.0305 LearningRate 0.1816 Epoch: 1 Global Step: 5960 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:36:45,870-Speed 3480.81 samples/sec Loss 14.9780 LearningRate 0.1816 Epoch: 1 Global Step: 5970 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:36:48,808-Speed 3485.40 samples/sec Loss 14.9933 LearningRate 0.1815 Epoch: 1 Global Step: 5980 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:36:51,792-Speed 3432.53 samples/sec Loss 14.8063 LearningRate 0.1815 Epoch: 1 Global Step: 5990 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:36:54,718-Speed 3501.22 samples/sec Loss 14.9912 LearningRate 0.1815 Epoch: 1 Global Step: 6000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:37:37,811-[lfw][6000]XNorm: 22.726149 Training: 2022-01-19 17:37:37,812-[lfw][6000]Accuracy-Flip: 0.99133+-0.00407 Training: 2022-01-19 17:37:37,812-[lfw][6000]Accuracy-Highest: 0.99133 Training: 2022-01-19 17:38:27,954-[cfp_fp][6000]XNorm: 19.488672 Training: 2022-01-19 17:38:27,954-[cfp_fp][6000]Accuracy-Flip: 0.89057+-0.02013 Training: 2022-01-19 17:38:27,955-[cfp_fp][6000]Accuracy-Highest: 0.89057 Training: 2022-01-19 17:39:11,262-[agedb_30][6000]XNorm: 22.178072 Training: 2022-01-19 17:39:11,263-[agedb_30][6000]Accuracy-Flip: 0.94817+-0.01102 Training: 2022-01-19 17:39:11,264-[agedb_30][6000]Accuracy-Highest: 0.94817 Training: 2022-01-19 17:39:14,196-Speed 73.42 samples/sec Loss 15.0312 LearningRate 0.1814 Epoch: 1 Global Step: 6010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:17,119-Speed 3504.00 samples/sec Loss 15.1037 LearningRate 0.1814 Epoch: 1 Global Step: 6020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:20,063-Speed 3478.59 samples/sec Loss 14.9914 LearningRate 0.1814 Epoch: 1 Global Step: 6030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:23,003-Speed 3483.61 samples/sec Loss 14.9001 LearningRate 0.1813 Epoch: 1 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:25,935-Speed 3493.90 samples/sec Loss 15.1305 LearningRate 0.1813 Epoch: 1 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:28,882-Speed 3475.31 samples/sec Loss 14.7634 LearningRate 0.1813 Epoch: 1 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:31,824-Speed 3481.44 samples/sec Loss 15.1722 LearningRate 0.1813 Epoch: 1 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:34,751-Speed 3499.79 samples/sec Loss 14.8626 LearningRate 0.1812 Epoch: 1 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:37,714-Speed 3457.48 samples/sec Loss 14.9619 LearningRate 0.1812 Epoch: 1 Global Step: 6090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:40,664-Speed 3471.22 samples/sec Loss 14.9738 LearningRate 0.1812 Epoch: 1 Global Step: 6100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:39:43,608-Speed 3479.59 samples/sec Loss 14.7249 LearningRate 0.1811 Epoch: 1 Global Step: 6110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:46,641-Speed 3377.78 samples/sec Loss 14.6938 LearningRate 0.1811 Epoch: 1 Global Step: 6120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:49,571-Speed 3494.94 samples/sec Loss 14.8338 LearningRate 0.1811 Epoch: 1 Global Step: 6130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:52,515-Speed 3479.51 samples/sec Loss 14.9499 LearningRate 0.1810 Epoch: 1 Global Step: 6140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:55,456-Speed 3482.44 samples/sec Loss 14.8725 LearningRate 0.1810 Epoch: 1 Global Step: 6150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:39:58,393-Speed 3489.18 samples/sec Loss 15.0131 LearningRate 0.1810 Epoch: 1 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:40:01,443-Speed 3358.34 samples/sec Loss 14.9456 LearningRate 0.1810 Epoch: 1 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:40:04,382-Speed 3484.69 samples/sec Loss 14.9321 LearningRate 0.1809 Epoch: 1 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:40:07,407-Speed 3386.47 samples/sec Loss 14.7156 LearningRate 0.1809 Epoch: 1 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:40:10,377-Speed 3448.17 samples/sec Loss 14.8869 LearningRate 0.1809 Epoch: 1 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:40:13,471-Speed 3310.34 samples/sec Loss 14.9450 LearningRate 0.1808 Epoch: 1 Global Step: 6210 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:40:16,567-Speed 3308.79 samples/sec Loss 14.8610 LearningRate 0.1808 Epoch: 1 Global Step: 6220 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:40:19,529-Speed 3457.52 samples/sec Loss 14.8181 LearningRate 0.1808 Epoch: 1 Global Step: 6230 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:40:22,474-Speed 3478.46 samples/sec Loss 14.7306 LearningRate 0.1807 Epoch: 1 Global Step: 6240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:40:25,395-Speed 3506.53 samples/sec Loss 14.8228 LearningRate 0.1807 Epoch: 1 Global Step: 6250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:40:28,326-Speed 3496.44 samples/sec Loss 14.8862 LearningRate 0.1807 Epoch: 1 Global Step: 6260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:40:31,256-Speed 3495.67 samples/sec Loss 14.9275 LearningRate 0.1807 Epoch: 1 Global Step: 6270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:40:34,203-Speed 3475.83 samples/sec Loss 14.7771 LearningRate 0.1806 Epoch: 1 Global Step: 6280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:40:37,145-Speed 3481.59 samples/sec Loss 14.7938 LearningRate 0.1806 Epoch: 1 Global Step: 6290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:40:40,085-Speed 3483.55 samples/sec Loss 14.9396 LearningRate 0.1806 Epoch: 1 Global Step: 6300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:40:43,029-Speed 3479.48 samples/sec Loss 14.5230 LearningRate 0.1805 Epoch: 1 Global Step: 6310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:40:46,001-Speed 3446.08 samples/sec Loss 14.8521 LearningRate 0.1805 Epoch: 1 Global Step: 6320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:40:48,951-Speed 3472.56 samples/sec Loss 14.7966 LearningRate 0.1805 Epoch: 1 Global Step: 6330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:40:51,881-Speed 3495.23 samples/sec Loss 14.7447 LearningRate 0.1804 Epoch: 1 Global Step: 6340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:40:54,806-Speed 3501.92 samples/sec Loss 14.6590 LearningRate 0.1804 Epoch: 1 Global Step: 6350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:40:57,797-Speed 3425.26 samples/sec Loss 14.6133 LearningRate 0.1804 Epoch: 1 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:00,800-Speed 3410.22 samples/sec Loss 14.7559 LearningRate 0.1804 Epoch: 1 Global Step: 6370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:03,738-Speed 3487.43 samples/sec Loss 14.7289 LearningRate 0.1803 Epoch: 1 Global Step: 6380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:06,665-Speed 3498.53 samples/sec Loss 14.9145 LearningRate 0.1803 Epoch: 1 Global Step: 6390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:09,592-Speed 3499.56 samples/sec Loss 14.7914 LearningRate 0.1803 Epoch: 1 Global Step: 6400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:12,520-Speed 3497.74 samples/sec Loss 14.9111 LearningRate 0.1802 Epoch: 1 Global Step: 6410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:15,448-Speed 3499.05 samples/sec Loss 14.7759 LearningRate 0.1802 Epoch: 1 Global Step: 6420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:18,380-Speed 3493.15 samples/sec Loss 14.6600 LearningRate 0.1802 Epoch: 1 Global Step: 6430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:21,305-Speed 3502.37 samples/sec Loss 14.8510 LearningRate 0.1801 Epoch: 1 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:24,290-Speed 3431.76 samples/sec Loss 14.7750 LearningRate 0.1801 Epoch: 1 Global Step: 6450 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:41:27,254-Speed 3455.44 samples/sec Loss 14.4610 LearningRate 0.1801 Epoch: 1 Global Step: 6460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:41:30,236-Speed 3434.23 samples/sec Loss 14.7503 LearningRate 0.1801 Epoch: 1 Global Step: 6470 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:41:33,210-Speed 3444.18 samples/sec Loss 14.7538 LearningRate 0.1800 Epoch: 1 Global Step: 6480 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:41:36,140-Speed 3496.40 samples/sec Loss 14.6154 LearningRate 0.1800 Epoch: 1 Global Step: 6490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:41:39,074-Speed 3490.83 samples/sec Loss 14.8068 LearningRate 0.1800 Epoch: 1 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:43,253-Speed 2450.76 samples/sec Loss 14.6148 LearningRate 0.1799 Epoch: 1 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:46,177-Speed 3502.24 samples/sec Loss 14.5655 LearningRate 0.1799 Epoch: 1 Global Step: 6520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:49,120-Speed 3481.21 samples/sec Loss 14.5133 LearningRate 0.1799 Epoch: 1 Global Step: 6530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:52,156-Speed 3373.35 samples/sec Loss 14.6186 LearningRate 0.1798 Epoch: 1 Global Step: 6540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:55,095-Speed 3484.95 samples/sec Loss 14.5270 LearningRate 0.1798 Epoch: 1 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:41:58,022-Speed 3500.58 samples/sec Loss 14.5522 LearningRate 0.1798 Epoch: 1 Global Step: 6560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:00,951-Speed 3496.99 samples/sec Loss 14.6686 LearningRate 0.1798 Epoch: 1 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:03,885-Speed 3491.36 samples/sec Loss 14.5436 LearningRate 0.1797 Epoch: 1 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:06,811-Speed 3499.72 samples/sec Loss 14.6152 LearningRate 0.1797 Epoch: 1 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:09,749-Speed 3486.42 samples/sec Loss 14.6124 LearningRate 0.1797 Epoch: 1 Global Step: 6600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:42:12,695-Speed 3476.97 samples/sec Loss 14.9242 LearningRate 0.1796 Epoch: 1 Global Step: 6610 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:42:15,625-Speed 3495.45 samples/sec Loss 14.6044 LearningRate 0.1796 Epoch: 1 Global Step: 6620 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:42:18,552-Speed 3500.04 samples/sec Loss 14.6826 LearningRate 0.1796 Epoch: 1 Global Step: 6630 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:42:21,480-Speed 3498.59 samples/sec Loss 14.7052 LearningRate 0.1795 Epoch: 1 Global Step: 6640 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:42:24,409-Speed 3497.06 samples/sec Loss 14.7108 LearningRate 0.1795 Epoch: 1 Global Step: 6650 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:42:27,338-Speed 3496.40 samples/sec Loss 14.4993 LearningRate 0.1795 Epoch: 1 Global Step: 6660 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:42:30,272-Speed 3491.27 samples/sec Loss 14.6364 LearningRate 0.1795 Epoch: 1 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:33,198-Speed 3500.01 samples/sec Loss 14.5726 LearningRate 0.1794 Epoch: 1 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:36,144-Speed 3477.07 samples/sec Loss 14.7166 LearningRate 0.1794 Epoch: 1 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:39,095-Speed 3472.05 samples/sec Loss 14.6609 LearningRate 0.1794 Epoch: 1 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:42,022-Speed 3499.73 samples/sec Loss 14.4817 LearningRate 0.1793 Epoch: 1 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:44,959-Speed 3487.30 samples/sec Loss 14.4420 LearningRate 0.1793 Epoch: 1 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:47,883-Speed 3502.56 samples/sec Loss 14.6654 LearningRate 0.1793 Epoch: 1 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:50,816-Speed 3492.48 samples/sec Loss 14.4683 LearningRate 0.1792 Epoch: 1 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:53,750-Speed 3491.86 samples/sec Loss 14.4539 LearningRate 0.1792 Epoch: 1 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:56,718-Speed 3450.97 samples/sec Loss 14.5934 LearningRate 0.1792 Epoch: 1 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:42:59,697-Speed 3437.17 samples/sec Loss 14.6512 LearningRate 0.1792 Epoch: 1 Global Step: 6770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:43:02,623-Speed 3501.63 samples/sec Loss 14.6892 LearningRate 0.1791 Epoch: 1 Global Step: 6780 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:43:05,554-Speed 3494.58 samples/sec Loss 14.3782 LearningRate 0.1791 Epoch: 1 Global Step: 6790 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:43:08,493-Speed 3484.26 samples/sec Loss 14.5127 LearningRate 0.1791 Epoch: 1 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:43:11,426-Speed 3492.71 samples/sec Loss 14.4302 LearningRate 0.1790 Epoch: 1 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:43:14,355-Speed 3498.03 samples/sec Loss 14.5536 LearningRate 0.1790 Epoch: 1 Global Step: 6820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:43:17,320-Speed 3453.77 samples/sec Loss 14.5991 LearningRate 0.1790 Epoch: 1 Global Step: 6830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:43:20,293-Speed 3445.61 samples/sec Loss 14.5024 LearningRate 0.1789 Epoch: 1 Global Step: 6840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:43:23,243-Speed 3471.63 samples/sec Loss 14.6064 LearningRate 0.1789 Epoch: 1 Global Step: 6850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:43:26,174-Speed 3494.89 samples/sec Loss 14.5120 LearningRate 0.1789 Epoch: 1 Global Step: 6860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:43:29,128-Speed 3467.24 samples/sec Loss 14.2659 LearningRate 0.1789 Epoch: 1 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:43:32,074-Speed 3477.34 samples/sec Loss 14.2710 LearningRate 0.1788 Epoch: 1 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:43:35,003-Speed 3496.55 samples/sec Loss 14.5714 LearningRate 0.1788 Epoch: 1 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:43:37,968-Speed 3454.61 samples/sec Loss 14.5381 LearningRate 0.1788 Epoch: 1 Global Step: 6900 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:43:40,920-Speed 3470.40 samples/sec Loss 14.7127 LearningRate 0.1787 Epoch: 1 Global Step: 6910 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:43:43,858-Speed 3486.78 samples/sec Loss 14.5021 LearningRate 0.1787 Epoch: 1 Global Step: 6920 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:43:46,785-Speed 3498.31 samples/sec Loss 14.4235 LearningRate 0.1787 Epoch: 1 Global Step: 6930 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:43:49,740-Speed 3467.12 samples/sec Loss 14.3082 LearningRate 0.1786 Epoch: 1 Global Step: 6940 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:43:52,670-Speed 3495.77 samples/sec Loss 14.4843 LearningRate 0.1786 Epoch: 1 Global Step: 6950 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:43:55,607-Speed 3486.74 samples/sec Loss 14.5389 LearningRate 0.1786 Epoch: 1 Global Step: 6960 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:43:58,537-Speed 3496.81 samples/sec Loss 14.5305 LearningRate 0.1786 Epoch: 1 Global Step: 6970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:44:01,525-Speed 3427.45 samples/sec Loss 14.4197 LearningRate 0.1785 Epoch: 1 Global Step: 6980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:44:04,485-Speed 3460.76 samples/sec Loss 14.2711 LearningRate 0.1785 Epoch: 1 Global Step: 6990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:44:07,450-Speed 3455.37 samples/sec Loss 14.6465 LearningRate 0.1785 Epoch: 1 Global Step: 7000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:44:10,391-Speed 3482.72 samples/sec Loss 14.4242 LearningRate 0.1784 Epoch: 1 Global Step: 7010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:44:13,321-Speed 3495.10 samples/sec Loss 14.6181 LearningRate 0.1784 Epoch: 1 Global Step: 7020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:44:16,258-Speed 3487.98 samples/sec Loss 14.4330 LearningRate 0.1784 Epoch: 1 Global Step: 7030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:44:19,192-Speed 3490.81 samples/sec Loss 14.4307 LearningRate 0.1784 Epoch: 1 Global Step: 7040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:44:22,122-Speed 3495.98 samples/sec Loss 14.1016 LearningRate 0.1783 Epoch: 1 Global Step: 7050 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:44:25,070-Speed 3474.01 samples/sec Loss 14.5332 LearningRate 0.1783 Epoch: 1 Global Step: 7060 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:44:28,003-Speed 3492.77 samples/sec Loss 14.4153 LearningRate 0.1783 Epoch: 1 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:44:30,974-Speed 3447.45 samples/sec Loss 14.3721 LearningRate 0.1782 Epoch: 1 Global Step: 7080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:44:33,910-Speed 3489.62 samples/sec Loss 14.4738 LearningRate 0.1782 Epoch: 1 Global Step: 7090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:44:36,855-Speed 3478.14 samples/sec Loss 14.6609 LearningRate 0.1782 Epoch: 1 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:44:39,894-Speed 3369.95 samples/sec Loss 14.3941 LearningRate 0.1781 Epoch: 1 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:44:42,847-Speed 3469.42 samples/sec Loss 14.4798 LearningRate 0.1781 Epoch: 1 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:44:45,844-Speed 3418.16 samples/sec Loss 14.4374 LearningRate 0.1781 Epoch: 1 Global Step: 7130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:44:48,775-Speed 3494.41 samples/sec Loss 14.2423 LearningRate 0.1781 Epoch: 1 Global Step: 7140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:44:51,736-Speed 3458.46 samples/sec Loss 14.6066 LearningRate 0.1780 Epoch: 1 Global Step: 7150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:44:54,712-Speed 3442.89 samples/sec Loss 14.4168 LearningRate 0.1780 Epoch: 1 Global Step: 7160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:44:57,646-Speed 3491.04 samples/sec Loss 14.4290 LearningRate 0.1780 Epoch: 1 Global Step: 7170 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:45:00,590-Speed 3478.90 samples/sec Loss 14.2216 LearningRate 0.1779 Epoch: 1 Global Step: 7180 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:45:03,510-Speed 3508.59 samples/sec Loss 14.3919 LearningRate 0.1779 Epoch: 1 Global Step: 7190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:06,490-Speed 3436.98 samples/sec Loss 14.4228 LearningRate 0.1779 Epoch: 1 Global Step: 7200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:09,421-Speed 3495.04 samples/sec Loss 14.4032 LearningRate 0.1778 Epoch: 1 Global Step: 7210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:12,355-Speed 3490.21 samples/sec Loss 14.5983 LearningRate 0.1778 Epoch: 1 Global Step: 7220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:15,285-Speed 3495.25 samples/sec Loss 14.3073 LearningRate 0.1778 Epoch: 1 Global Step: 7230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:18,217-Speed 3494.44 samples/sec Loss 14.3809 LearningRate 0.1778 Epoch: 1 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:21,149-Speed 3492.62 samples/sec Loss 14.3448 LearningRate 0.1777 Epoch: 1 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:24,080-Speed 3494.62 samples/sec Loss 14.4274 LearningRate 0.1777 Epoch: 1 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:27,025-Speed 3478.69 samples/sec Loss 14.2351 LearningRate 0.1777 Epoch: 1 Global Step: 7270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:29,959-Speed 3490.72 samples/sec Loss 14.1977 LearningRate 0.1776 Epoch: 1 Global Step: 7280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:32,882-Speed 3504.58 samples/sec Loss 14.2753 LearningRate 0.1776 Epoch: 1 Global Step: 7290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:35,815-Speed 3491.86 samples/sec Loss 14.4848 LearningRate 0.1776 Epoch: 1 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:38,748-Speed 3493.05 samples/sec Loss 14.3737 LearningRate 0.1775 Epoch: 1 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:41,683-Speed 3489.47 samples/sec Loss 14.3406 LearningRate 0.1775 Epoch: 1 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:44,628-Speed 3478.70 samples/sec Loss 14.4110 LearningRate 0.1775 Epoch: 1 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:47,557-Speed 3496.62 samples/sec Loss 14.2899 LearningRate 0.1775 Epoch: 1 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:50,507-Speed 3471.73 samples/sec Loss 14.2827 LearningRate 0.1774 Epoch: 1 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:45:53,432-Speed 3502.29 samples/sec Loss 14.0576 LearningRate 0.1774 Epoch: 1 Global Step: 7360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:45:56,365-Speed 3492.58 samples/sec Loss 14.3757 LearningRate 0.1774 Epoch: 1 Global Step: 7370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:45:59,340-Speed 3442.93 samples/sec Loss 14.3968 LearningRate 0.1773 Epoch: 1 Global Step: 7380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:46:02,288-Speed 3473.96 samples/sec Loss 14.2776 LearningRate 0.1773 Epoch: 1 Global Step: 7390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:46:05,230-Speed 3481.61 samples/sec Loss 14.3048 LearningRate 0.1773 Epoch: 1 Global Step: 7400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:46:08,158-Speed 3497.99 samples/sec Loss 14.2411 LearningRate 0.1772 Epoch: 1 Global Step: 7410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:46:11,100-Speed 3482.24 samples/sec Loss 14.3528 LearningRate 0.1772 Epoch: 1 Global Step: 7420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:46:14,050-Speed 3471.65 samples/sec Loss 14.2173 LearningRate 0.1772 Epoch: 1 Global Step: 7430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:46:16,983-Speed 3491.98 samples/sec Loss 14.2353 LearningRate 0.1772 Epoch: 1 Global Step: 7440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:46:19,919-Speed 3488.92 samples/sec Loss 14.1342 LearningRate 0.1771 Epoch: 1 Global Step: 7450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-19 17:46:22,927-Speed 3404.88 samples/sec Loss 14.3531 LearningRate 0.1771 Epoch: 1 Global Step: 7460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:46:25,860-Speed 3492.19 samples/sec Loss 14.3096 LearningRate 0.1771 Epoch: 1 Global Step: 7470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:46:28,823-Speed 3458.19 samples/sec Loss 13.9934 LearningRate 0.1770 Epoch: 1 Global Step: 7480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:46:31,757-Speed 3490.86 samples/sec Loss 14.3722 LearningRate 0.1770 Epoch: 1 Global Step: 7490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:46:34,692-Speed 3492.33 samples/sec Loss 14.3153 LearningRate 0.1770 Epoch: 1 Global Step: 7500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:46:37,626-Speed 3490.36 samples/sec Loss 14.2799 LearningRate 0.1769 Epoch: 1 Global Step: 7510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:46:40,583-Speed 3463.45 samples/sec Loss 14.1126 LearningRate 0.1769 Epoch: 1 Global Step: 7520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:46:43,519-Speed 3489.79 samples/sec Loss 14.0463 LearningRate 0.1769 Epoch: 1 Global Step: 7530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:46:46,486-Speed 3451.56 samples/sec Loss 14.2289 LearningRate 0.1769 Epoch: 1 Global Step: 7540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:46:49,418-Speed 3494.72 samples/sec Loss 14.1729 LearningRate 0.1768 Epoch: 1 Global Step: 7550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:46:52,349-Speed 3495.05 samples/sec Loss 14.4218 LearningRate 0.1768 Epoch: 1 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:46:55,342-Speed 3422.00 samples/sec Loss 14.3216 LearningRate 0.1768 Epoch: 1 Global Step: 7570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:46:58,350-Speed 3405.15 samples/sec Loss 14.3792 LearningRate 0.1767 Epoch: 1 Global Step: 7580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:01,311-Speed 3458.44 samples/sec Loss 13.9378 LearningRate 0.1767 Epoch: 1 Global Step: 7590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:04,243-Speed 3494.30 samples/sec Loss 14.3203 LearningRate 0.1767 Epoch: 1 Global Step: 7600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:07,174-Speed 3493.97 samples/sec Loss 14.3219 LearningRate 0.1767 Epoch: 1 Global Step: 7610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:10,104-Speed 3495.22 samples/sec Loss 14.1164 LearningRate 0.1766 Epoch: 1 Global Step: 7620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:13,039-Speed 3489.80 samples/sec Loss 14.1407 LearningRate 0.1766 Epoch: 1 Global Step: 7630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:15,990-Speed 3470.74 samples/sec Loss 14.1208 LearningRate 0.1766 Epoch: 1 Global Step: 7640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:19,068-Speed 3330.21 samples/sec Loss 14.1445 LearningRate 0.1765 Epoch: 1 Global Step: 7650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:22,145-Speed 3328.06 samples/sec Loss 14.1171 LearningRate 0.1765 Epoch: 1 Global Step: 7660 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:47:25,082-Speed 3487.82 samples/sec Loss 14.2429 LearningRate 0.1765 Epoch: 1 Global Step: 7670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:28,041-Speed 3461.48 samples/sec Loss 14.2021 LearningRate 0.1764 Epoch: 1 Global Step: 7680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:30,978-Speed 3487.90 samples/sec Loss 14.0368 LearningRate 0.1764 Epoch: 1 Global Step: 7690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:33,911-Speed 3491.04 samples/sec Loss 14.1389 LearningRate 0.1764 Epoch: 1 Global Step: 7700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:36,846-Speed 3489.74 samples/sec Loss 14.0205 LearningRate 0.1764 Epoch: 1 Global Step: 7710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:39,780-Speed 3492.21 samples/sec Loss 14.0617 LearningRate 0.1763 Epoch: 1 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:42,732-Speed 3469.60 samples/sec Loss 14.0884 LearningRate 0.1763 Epoch: 1 Global Step: 7730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:45,663-Speed 3494.21 samples/sec Loss 14.0865 LearningRate 0.1763 Epoch: 1 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:48,597-Speed 3493.09 samples/sec Loss 14.0893 LearningRate 0.1762 Epoch: 1 Global Step: 7750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:51,539-Speed 3481.12 samples/sec Loss 14.1727 LearningRate 0.1762 Epoch: 1 Global Step: 7760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:47:54,473-Speed 3490.22 samples/sec Loss 14.0793 LearningRate 0.1762 Epoch: 1 Global Step: 7770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:47:57,415-Speed 3481.98 samples/sec Loss 14.2414 LearningRate 0.1761 Epoch: 1 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:48:00,351-Speed 3489.20 samples/sec Loss 14.1557 LearningRate 0.1761 Epoch: 1 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:48:03,293-Speed 3480.86 samples/sec Loss 14.0137 LearningRate 0.1761 Epoch: 1 Global Step: 7800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:48:06,225-Speed 3493.68 samples/sec Loss 14.1644 LearningRate 0.1761 Epoch: 1 Global Step: 7810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:48:09,162-Speed 3487.40 samples/sec Loss 13.9802 LearningRate 0.1760 Epoch: 1 Global Step: 7820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:48:12,099-Speed 3487.34 samples/sec Loss 14.0607 LearningRate 0.1760 Epoch: 1 Global Step: 7830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:48:15,049-Speed 3476.31 samples/sec Loss 14.1966 LearningRate 0.1760 Epoch: 1 Global Step: 7840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:48:17,999-Speed 3471.12 samples/sec Loss 14.0054 LearningRate 0.1759 Epoch: 1 Global Step: 7850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:48:20,933-Speed 3492.14 samples/sec Loss 14.1817 LearningRate 0.1759 Epoch: 1 Global Step: 7860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:48:23,877-Speed 3478.57 samples/sec Loss 13.9900 LearningRate 0.1759 Epoch: 1 Global Step: 7870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:48:26,824-Speed 3474.84 samples/sec Loss 14.0509 LearningRate 0.1758 Epoch: 1 Global Step: 7880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:48:29,760-Speed 3489.38 samples/sec Loss 13.8570 LearningRate 0.1758 Epoch: 1 Global Step: 7890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:48:32,803-Speed 3365.57 samples/sec Loss 14.1510 LearningRate 0.1758 Epoch: 1 Global Step: 7900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 17:48:35,754-Speed 3471.76 samples/sec Loss 14.0241 LearningRate 0.1758 Epoch: 1 Global Step: 7910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:48:38,754-Speed 3413.53 samples/sec Loss 14.1769 LearningRate 0.1757 Epoch: 1 Global Step: 7920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:48:41,725-Speed 3448.41 samples/sec Loss 14.1208 LearningRate 0.1757 Epoch: 1 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:48:44,684-Speed 3461.12 samples/sec Loss 13.9547 LearningRate 0.1757 Epoch: 1 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:48:47,616-Speed 3493.37 samples/sec Loss 13.8566 LearningRate 0.1756 Epoch: 1 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:48:50,553-Speed 3488.25 samples/sec Loss 13.9620 LearningRate 0.1756 Epoch: 1 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:48:53,528-Speed 3442.57 samples/sec Loss 13.9796 LearningRate 0.1756 Epoch: 1 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:48:56,465-Speed 3486.74 samples/sec Loss 14.1045 LearningRate 0.1756 Epoch: 1 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:48:59,481-Speed 3396.53 samples/sec Loss 13.9238 LearningRate 0.1755 Epoch: 1 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:49:02,432-Speed 3471.06 samples/sec Loss 14.0339 LearningRate 0.1755 Epoch: 1 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:49:45,248-[lfw][8000]XNorm: 23.351768 Training: 2022-01-19 17:49:45,249-[lfw][8000]Accuracy-Flip: 0.99433+-0.00186 Training: 2022-01-19 17:49:45,249-[lfw][8000]Accuracy-Highest: 0.99433 Training: 2022-01-19 17:50:35,377-[cfp_fp][8000]XNorm: 20.380963 Training: 2022-01-19 17:50:35,378-[cfp_fp][8000]Accuracy-Flip: 0.92243+-0.01866 Training: 2022-01-19 17:50:35,379-[cfp_fp][8000]Accuracy-Highest: 0.92243 Training: 2022-01-19 17:51:18,401-[agedb_30][8000]XNorm: 22.391388 Training: 2022-01-19 17:51:18,401-[agedb_30][8000]Accuracy-Flip: 0.95233+-0.01323 Training: 2022-01-19 17:51:18,402-[agedb_30][8000]Accuracy-Highest: 0.95233 Training: 2022-01-19 17:51:21,408-Speed 73.68 samples/sec Loss 14.0813 LearningRate 0.1755 Epoch: 1 Global Step: 8010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:51:24,341-Speed 3491.46 samples/sec Loss 14.0896 LearningRate 0.1754 Epoch: 1 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:51:27,265-Speed 3503.71 samples/sec Loss 14.2019 LearningRate 0.1754 Epoch: 1 Global Step: 8030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:51:30,200-Speed 3490.20 samples/sec Loss 14.1683 LearningRate 0.1754 Epoch: 1 Global Step: 8040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:51:33,143-Speed 3480.70 samples/sec Loss 13.9754 LearningRate 0.1753 Epoch: 1 Global Step: 8050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:51:36,073-Speed 3495.49 samples/sec Loss 14.0950 LearningRate 0.1753 Epoch: 1 Global Step: 8060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:51:39,014-Speed 3482.58 samples/sec Loss 14.0104 LearningRate 0.1753 Epoch: 1 Global Step: 8070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:51:41,944-Speed 3496.01 samples/sec Loss 13.8517 LearningRate 0.1753 Epoch: 1 Global Step: 8080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:51:44,886-Speed 3481.83 samples/sec Loss 14.0270 LearningRate 0.1752 Epoch: 1 Global Step: 8090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:51:47,829-Speed 3480.54 samples/sec Loss 14.1782 LearningRate 0.1752 Epoch: 1 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:51:50,757-Speed 3498.44 samples/sec Loss 13.9636 LearningRate 0.1752 Epoch: 1 Global Step: 8110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:51:53,686-Speed 3496.39 samples/sec Loss 13.9979 LearningRate 0.1751 Epoch: 1 Global Step: 8120 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:51:56,617-Speed 3495.02 samples/sec Loss 14.0948 LearningRate 0.1751 Epoch: 1 Global Step: 8130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:51:59,553-Speed 3488.03 samples/sec Loss 13.9281 LearningRate 0.1751 Epoch: 1 Global Step: 8140 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:52:02,562-Speed 3403.92 samples/sec Loss 14.0392 LearningRate 0.1750 Epoch: 1 Global Step: 8150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:52:05,530-Speed 3453.73 samples/sec Loss 13.9635 LearningRate 0.1750 Epoch: 1 Global Step: 8160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:52:08,472-Speed 3481.70 samples/sec Loss 13.8933 LearningRate 0.1750 Epoch: 1 Global Step: 8170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:52:11,467-Speed 3420.00 samples/sec Loss 14.0124 LearningRate 0.1750 Epoch: 1 Global Step: 8180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:52:14,430-Speed 3457.20 samples/sec Loss 13.7898 LearningRate 0.1749 Epoch: 1 Global Step: 8190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:52:17,382-Speed 3468.53 samples/sec Loss 14.0618 LearningRate 0.1749 Epoch: 1 Global Step: 8200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:52:20,319-Speed 3488.14 samples/sec Loss 14.1576 LearningRate 0.1749 Epoch: 1 Global Step: 8210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:52:23,282-Speed 3457.17 samples/sec Loss 14.0157 LearningRate 0.1748 Epoch: 1 Global Step: 8220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:52:26,216-Speed 3491.30 samples/sec Loss 13.9985 LearningRate 0.1748 Epoch: 1 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:52:29,180-Speed 3455.50 samples/sec Loss 14.2288 LearningRate 0.1748 Epoch: 1 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:52:32,101-Speed 3506.40 samples/sec Loss 14.0858 LearningRate 0.1748 Epoch: 1 Global Step: 8250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:52:35,052-Speed 3470.47 samples/sec Loss 14.0811 LearningRate 0.1747 Epoch: 1 Global Step: 8260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:52:38,038-Speed 3430.64 samples/sec Loss 14.0197 LearningRate 0.1747 Epoch: 1 Global Step: 8270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:52:40,967-Speed 3496.18 samples/sec Loss 13.8907 LearningRate 0.1747 Epoch: 1 Global Step: 8280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:52:43,913-Speed 3477.63 samples/sec Loss 13.8199 LearningRate 0.1746 Epoch: 1 Global Step: 8290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:52:46,843-Speed 3494.84 samples/sec Loss 14.0074 LearningRate 0.1746 Epoch: 1 Global Step: 8300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:52:49,772-Speed 3497.57 samples/sec Loss 13.9945 LearningRate 0.1746 Epoch: 1 Global Step: 8310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:52:52,704-Speed 3493.19 samples/sec Loss 13.8878 LearningRate 0.1745 Epoch: 1 Global Step: 8320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:52:55,635-Speed 3494.66 samples/sec Loss 13.8912 LearningRate 0.1745 Epoch: 1 Global Step: 8330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:52:58,577-Speed 3481.93 samples/sec Loss 14.0938 LearningRate 0.1745 Epoch: 1 Global Step: 8340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:53:01,616-Speed 3370.06 samples/sec Loss 13.7274 LearningRate 0.1745 Epoch: 1 Global Step: 8350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:53:04,545-Speed 3497.08 samples/sec Loss 13.8391 LearningRate 0.1744 Epoch: 1 Global Step: 8360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:53:07,514-Speed 3449.13 samples/sec Loss 14.1712 LearningRate 0.1744 Epoch: 1 Global Step: 8370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:53:10,455-Speed 3483.14 samples/sec Loss 13.9366 LearningRate 0.1744 Epoch: 1 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:53:13,440-Speed 3431.56 samples/sec Loss 13.9525 LearningRate 0.1743 Epoch: 1 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:53:16,481-Speed 3368.08 samples/sec Loss 13.7838 LearningRate 0.1743 Epoch: 1 Global Step: 8400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:53:19,421-Speed 3484.15 samples/sec Loss 13.9959 LearningRate 0.1743 Epoch: 1 Global Step: 8410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:53:22,348-Speed 3499.05 samples/sec Loss 14.1007 LearningRate 0.1743 Epoch: 1 Global Step: 8420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:53:25,269-Speed 3507.14 samples/sec Loss 13.9010 LearningRate 0.1742 Epoch: 1 Global Step: 8430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:53:28,202-Speed 3491.56 samples/sec Loss 13.9910 LearningRate 0.1742 Epoch: 1 Global Step: 8440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:53:31,130-Speed 3499.04 samples/sec Loss 13.7855 LearningRate 0.1742 Epoch: 1 Global Step: 8450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:53:34,059-Speed 3496.82 samples/sec Loss 13.9352 LearningRate 0.1741 Epoch: 1 Global Step: 8460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:53:36,999-Speed 3484.23 samples/sec Loss 13.9420 LearningRate 0.1741 Epoch: 1 Global Step: 8470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:53:39,926-Speed 3499.53 samples/sec Loss 13.9170 LearningRate 0.1741 Epoch: 1 Global Step: 8480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:53:42,858-Speed 3493.30 samples/sec Loss 13.9283 LearningRate 0.1740 Epoch: 1 Global Step: 8490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:53:45,810-Speed 3468.76 samples/sec Loss 13.7969 LearningRate 0.1740 Epoch: 1 Global Step: 8500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:53:48,745-Speed 3490.69 samples/sec Loss 13.8240 LearningRate 0.1740 Epoch: 1 Global Step: 8510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:53:51,686-Speed 3482.11 samples/sec Loss 13.8269 LearningRate 0.1740 Epoch: 1 Global Step: 8520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-19 17:53:54,624-Speed 3486.18 samples/sec Loss 13.8527 LearningRate 0.1739 Epoch: 1 Global Step: 8530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:53:57,556-Speed 3493.24 samples/sec Loss 13.8325 LearningRate 0.1739 Epoch: 1 Global Step: 8540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:00,487-Speed 3495.08 samples/sec Loss 13.7711 LearningRate 0.1739 Epoch: 1 Global Step: 8550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:03,414-Speed 3499.94 samples/sec Loss 14.0587 LearningRate 0.1738 Epoch: 1 Global Step: 8560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:06,343-Speed 3496.75 samples/sec Loss 13.8270 LearningRate 0.1738 Epoch: 1 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:09,271-Speed 3498.02 samples/sec Loss 13.9450 LearningRate 0.1738 Epoch: 1 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:12,198-Speed 3498.97 samples/sec Loss 13.9009 LearningRate 0.1738 Epoch: 1 Global Step: 8590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:15,125-Speed 3500.05 samples/sec Loss 13.8050 LearningRate 0.1737 Epoch: 1 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:18,052-Speed 3499.59 samples/sec Loss 13.9871 LearningRate 0.1737 Epoch: 1 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:20,997-Speed 3477.11 samples/sec Loss 13.6797 LearningRate 0.1737 Epoch: 1 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:23,930-Speed 3492.44 samples/sec Loss 13.7799 LearningRate 0.1736 Epoch: 1 Global Step: 8630 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:54:26,864-Speed 3491.77 samples/sec Loss 14.0257 LearningRate 0.1736 Epoch: 1 Global Step: 8640 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:54:29,799-Speed 3489.67 samples/sec Loss 13.9065 LearningRate 0.1736 Epoch: 1 Global Step: 8650 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:54:32,726-Speed 3498.82 samples/sec Loss 13.8022 LearningRate 0.1735 Epoch: 1 Global Step: 8660 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:54:35,711-Speed 3432.09 samples/sec Loss 13.7903 LearningRate 0.1735 Epoch: 1 Global Step: 8670 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:54:38,672-Speed 3459.05 samples/sec Loss 13.8744 LearningRate 0.1735 Epoch: 1 Global Step: 8680 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:54:41,612-Speed 3483.22 samples/sec Loss 13.7961 LearningRate 0.1735 Epoch: 1 Global Step: 8690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:54:44,543-Speed 3494.58 samples/sec Loss 13.8247 LearningRate 0.1734 Epoch: 1 Global Step: 8700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:47,542-Speed 3415.30 samples/sec Loss 13.8714 LearningRate 0.1734 Epoch: 1 Global Step: 8710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:50,557-Speed 3397.99 samples/sec Loss 13.8290 LearningRate 0.1734 Epoch: 1 Global Step: 8720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:53,488-Speed 3493.79 samples/sec Loss 13.7970 LearningRate 0.1733 Epoch: 1 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:56,423-Speed 3491.01 samples/sec Loss 13.9242 LearningRate 0.1733 Epoch: 1 Global Step: 8740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:54:59,363-Speed 3484.10 samples/sec Loss 13.8795 LearningRate 0.1733 Epoch: 1 Global Step: 8750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:02,364-Speed 3413.25 samples/sec Loss 13.7649 LearningRate 0.1732 Epoch: 1 Global Step: 8760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:05,405-Speed 3367.53 samples/sec Loss 13.6452 LearningRate 0.1732 Epoch: 1 Global Step: 8770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:08,347-Speed 3482.22 samples/sec Loss 13.7931 LearningRate 0.1732 Epoch: 1 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:11,280-Speed 3491.10 samples/sec Loss 13.6153 LearningRate 0.1732 Epoch: 1 Global Step: 8790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:14,209-Speed 3497.34 samples/sec Loss 14.0116 LearningRate 0.1731 Epoch: 1 Global Step: 8800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:55:17,164-Speed 3466.36 samples/sec Loss 13.7169 LearningRate 0.1731 Epoch: 1 Global Step: 8810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:55:20,108-Speed 3478.71 samples/sec Loss 13.6462 LearningRate 0.1731 Epoch: 1 Global Step: 8820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:55:23,089-Speed 3436.29 samples/sec Loss 13.5702 LearningRate 0.1730 Epoch: 1 Global Step: 8830 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:55:26,028-Speed 3485.99 samples/sec Loss 13.7234 LearningRate 0.1730 Epoch: 1 Global Step: 8840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:55:28,966-Speed 3485.76 samples/sec Loss 13.8913 LearningRate 0.1730 Epoch: 1 Global Step: 8850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:55:31,897-Speed 3494.14 samples/sec Loss 13.5639 LearningRate 0.1730 Epoch: 1 Global Step: 8860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:55:34,832-Speed 3489.99 samples/sec Loss 13.8566 LearningRate 0.1729 Epoch: 1 Global Step: 8870 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:55:37,755-Speed 3504.30 samples/sec Loss 13.7160 LearningRate 0.1729 Epoch: 1 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:40,708-Speed 3468.09 samples/sec Loss 13.7526 LearningRate 0.1729 Epoch: 1 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:43,638-Speed 3496.15 samples/sec Loss 13.6414 LearningRate 0.1728 Epoch: 1 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:46,576-Speed 3485.68 samples/sec Loss 13.9469 LearningRate 0.1728 Epoch: 1 Global Step: 8910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:49,511-Speed 3491.30 samples/sec Loss 13.8396 LearningRate 0.1728 Epoch: 1 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:52,444-Speed 3492.18 samples/sec Loss 13.6718 LearningRate 0.1727 Epoch: 1 Global Step: 8930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:55,377-Speed 3492.68 samples/sec Loss 13.7771 LearningRate 0.1727 Epoch: 1 Global Step: 8940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:55:58,329-Speed 3469.08 samples/sec Loss 13.7666 LearningRate 0.1727 Epoch: 1 Global Step: 8950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:01,262-Speed 3492.55 samples/sec Loss 13.6888 LearningRate 0.1727 Epoch: 1 Global Step: 8960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:04,198-Speed 3488.30 samples/sec Loss 13.7861 LearningRate 0.1726 Epoch: 1 Global Step: 8970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:07,132-Speed 3491.06 samples/sec Loss 13.7083 LearningRate 0.1726 Epoch: 1 Global Step: 8980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:56:10,080-Speed 3474.53 samples/sec Loss 13.6905 LearningRate 0.1726 Epoch: 1 Global Step: 8990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:56:13,012-Speed 3493.06 samples/sec Loss 13.7269 LearningRate 0.1725 Epoch: 1 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:15,951-Speed 3484.63 samples/sec Loss 13.8677 LearningRate 0.1725 Epoch: 1 Global Step: 9010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:18,882-Speed 3495.82 samples/sec Loss 13.5892 LearningRate 0.1725 Epoch: 1 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:21,815-Speed 3491.58 samples/sec Loss 13.7344 LearningRate 0.1725 Epoch: 1 Global Step: 9030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:24,754-Speed 3484.90 samples/sec Loss 13.9224 LearningRate 0.1724 Epoch: 1 Global Step: 9040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:27,686-Speed 3494.47 samples/sec Loss 13.6494 LearningRate 0.1724 Epoch: 1 Global Step: 9050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:30,619-Speed 3491.25 samples/sec Loss 13.6613 LearningRate 0.1724 Epoch: 1 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:33,548-Speed 3496.91 samples/sec Loss 13.6732 LearningRate 0.1723 Epoch: 1 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:36,478-Speed 3496.48 samples/sec Loss 13.7330 LearningRate 0.1723 Epoch: 1 Global Step: 9080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:39,409-Speed 3494.90 samples/sec Loss 13.9569 LearningRate 0.1723 Epoch: 1 Global Step: 9090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:42,357-Speed 3473.81 samples/sec Loss 13.6201 LearningRate 0.1722 Epoch: 1 Global Step: 9100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:56:45,286-Speed 3497.35 samples/sec Loss 13.6169 LearningRate 0.1722 Epoch: 1 Global Step: 9110 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:56:48,213-Speed 3498.96 samples/sec Loss 13.5684 LearningRate 0.1722 Epoch: 1 Global Step: 9120 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:56:51,146-Speed 3492.17 samples/sec Loss 13.7940 LearningRate 0.1722 Epoch: 1 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:54,079-Speed 3492.71 samples/sec Loss 13.7851 LearningRate 0.1721 Epoch: 1 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:57,009-Speed 3495.91 samples/sec Loss 13.5580 LearningRate 0.1721 Epoch: 1 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:56:59,972-Speed 3456.80 samples/sec Loss 13.6671 LearningRate 0.1721 Epoch: 1 Global Step: 9160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:02,928-Speed 3464.80 samples/sec Loss 13.6239 LearningRate 0.1720 Epoch: 1 Global Step: 9170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:05,886-Speed 3462.47 samples/sec Loss 13.5354 LearningRate 0.1720 Epoch: 1 Global Step: 9180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:08,824-Speed 3486.61 samples/sec Loss 13.6909 LearningRate 0.1720 Epoch: 1 Global Step: 9190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:11,756-Speed 3494.02 samples/sec Loss 13.6127 LearningRate 0.1720 Epoch: 1 Global Step: 9200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:14,690-Speed 3491.12 samples/sec Loss 13.7447 LearningRate 0.1719 Epoch: 1 Global Step: 9210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:17,626-Speed 3487.94 samples/sec Loss 13.6753 LearningRate 0.1719 Epoch: 1 Global Step: 9220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:20,571-Speed 3478.72 samples/sec Loss 13.6290 LearningRate 0.1719 Epoch: 1 Global Step: 9230 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:57:23,502-Speed 3493.93 samples/sec Loss 13.5354 LearningRate 0.1718 Epoch: 1 Global Step: 9240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:57:26,434-Speed 3494.53 samples/sec Loss 13.7740 LearningRate 0.1718 Epoch: 1 Global Step: 9250 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:57:29,373-Speed 3484.33 samples/sec Loss 13.6138 LearningRate 0.1718 Epoch: 1 Global Step: 9260 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:57:32,322-Speed 3472.73 samples/sec Loss 13.5367 LearningRate 0.1718 Epoch: 1 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:35,260-Speed 3486.93 samples/sec Loss 13.4927 LearningRate 0.1717 Epoch: 1 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:38,207-Speed 3475.50 samples/sec Loss 13.5702 LearningRate 0.1717 Epoch: 1 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:41,148-Speed 3482.98 samples/sec Loss 13.5384 LearningRate 0.1717 Epoch: 1 Global Step: 9300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:44,141-Speed 3421.69 samples/sec Loss 13.6568 LearningRate 0.1716 Epoch: 1 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:47,103-Speed 3458.46 samples/sec Loss 13.6499 LearningRate 0.1716 Epoch: 1 Global Step: 9320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:50,038-Speed 3490.42 samples/sec Loss 13.5406 LearningRate 0.1716 Epoch: 1 Global Step: 9330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:52,970-Speed 3492.64 samples/sec Loss 13.6104 LearningRate 0.1715 Epoch: 1 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:55,957-Speed 3429.11 samples/sec Loss 13.5991 LearningRate 0.1715 Epoch: 1 Global Step: 9350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:57:58,892-Speed 3489.54 samples/sec Loss 13.7826 LearningRate 0.1715 Epoch: 1 Global Step: 9360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:58:01,858-Speed 3453.34 samples/sec Loss 13.4885 LearningRate 0.1715 Epoch: 1 Global Step: 9370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:58:04,837-Speed 3438.69 samples/sec Loss 13.5839 LearningRate 0.1714 Epoch: 1 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:58:07,794-Speed 3464.29 samples/sec Loss 13.7136 LearningRate 0.1714 Epoch: 1 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:58:10,760-Speed 3453.05 samples/sec Loss 13.5184 LearningRate 0.1714 Epoch: 1 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:58:13,721-Speed 3459.98 samples/sec Loss 13.5721 LearningRate 0.1713 Epoch: 1 Global Step: 9410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:58:16,680-Speed 3460.37 samples/sec Loss 13.5148 LearningRate 0.1713 Epoch: 1 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:58:19,621-Speed 3484.04 samples/sec Loss 13.6262 LearningRate 0.1713 Epoch: 1 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:58:22,554-Speed 3491.97 samples/sec Loss 13.8746 LearningRate 0.1713 Epoch: 1 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:58:25,488-Speed 3490.63 samples/sec Loss 13.5098 LearningRate 0.1712 Epoch: 1 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:58:28,420-Speed 3494.16 samples/sec Loss 13.6918 LearningRate 0.1712 Epoch: 1 Global Step: 9460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:58:31,405-Speed 3430.58 samples/sec Loss 13.5547 LearningRate 0.1712 Epoch: 1 Global Step: 9470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 17:58:34,381-Speed 3443.22 samples/sec Loss 13.4780 LearningRate 0.1711 Epoch: 1 Global Step: 9480 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:58:37,347-Speed 3453.38 samples/sec Loss 13.5587 LearningRate 0.1711 Epoch: 1 Global Step: 9490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:58:40,307-Speed 3459.82 samples/sec Loss 13.5778 LearningRate 0.1711 Epoch: 1 Global Step: 9500 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:58:43,241-Speed 3490.91 samples/sec Loss 13.5638 LearningRate 0.1710 Epoch: 1 Global Step: 9510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:58:46,177-Speed 3489.06 samples/sec Loss 13.6033 LearningRate 0.1710 Epoch: 1 Global Step: 9520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 17:58:49,110-Speed 3492.25 samples/sec Loss 13.4732 LearningRate 0.1710 Epoch: 1 Global Step: 9530 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:58:52,053-Speed 3480.22 samples/sec Loss 13.4156 LearningRate 0.1710 Epoch: 1 Global Step: 9540 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:58:55,006-Speed 3468.02 samples/sec Loss 13.4299 LearningRate 0.1709 Epoch: 1 Global Step: 9550 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:58:57,949-Speed 3480.54 samples/sec Loss 13.6482 LearningRate 0.1709 Epoch: 1 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:00,913-Speed 3456.10 samples/sec Loss 13.3335 LearningRate 0.1709 Epoch: 1 Global Step: 9570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:03,848-Speed 3490.23 samples/sec Loss 13.6481 LearningRate 0.1708 Epoch: 1 Global Step: 9580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:06,832-Speed 3432.70 samples/sec Loss 13.8562 LearningRate 0.1708 Epoch: 1 Global Step: 9590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:09,794-Speed 3458.04 samples/sec Loss 13.5916 LearningRate 0.1708 Epoch: 1 Global Step: 9600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:12,839-Speed 3363.49 samples/sec Loss 13.4018 LearningRate 0.1708 Epoch: 1 Global Step: 9610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:15,840-Speed 3412.28 samples/sec Loss 13.4683 LearningRate 0.1707 Epoch: 1 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:18,778-Speed 3486.29 samples/sec Loss 13.4374 LearningRate 0.1707 Epoch: 1 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:21,756-Speed 3439.51 samples/sec Loss 13.6196 LearningRate 0.1707 Epoch: 1 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:24,691-Speed 3489.74 samples/sec Loss 13.4807 LearningRate 0.1706 Epoch: 1 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:27,633-Speed 3482.48 samples/sec Loss 13.5909 LearningRate 0.1706 Epoch: 1 Global Step: 9660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:59:30,584-Speed 3471.07 samples/sec Loss 13.4738 LearningRate 0.1706 Epoch: 1 Global Step: 9670 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:59:33,526-Speed 3481.66 samples/sec Loss 13.4354 LearningRate 0.1706 Epoch: 1 Global Step: 9680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 17:59:36,452-Speed 3499.78 samples/sec Loss 13.5925 LearningRate 0.1705 Epoch: 1 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:39,380-Speed 3498.74 samples/sec Loss 13.5845 LearningRate 0.1705 Epoch: 1 Global Step: 9700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:42,312-Speed 3493.01 samples/sec Loss 13.4305 LearningRate 0.1705 Epoch: 1 Global Step: 9710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:45,245-Speed 3492.41 samples/sec Loss 13.5080 LearningRate 0.1704 Epoch: 1 Global Step: 9720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:48,190-Speed 3478.59 samples/sec Loss 13.4052 LearningRate 0.1704 Epoch: 1 Global Step: 9730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:51,127-Speed 3487.25 samples/sec Loss 13.3463 LearningRate 0.1704 Epoch: 1 Global Step: 9740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:54,059-Speed 3492.84 samples/sec Loss 13.4445 LearningRate 0.1703 Epoch: 1 Global Step: 9750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:56,990-Speed 3495.43 samples/sec Loss 13.3804 LearningRate 0.1703 Epoch: 1 Global Step: 9760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 17:59:59,922-Speed 3492.59 samples/sec Loss 13.3378 LearningRate 0.1703 Epoch: 1 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:02,880-Speed 3463.22 samples/sec Loss 13.5434 LearningRate 0.1703 Epoch: 1 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:05,821-Speed 3483.44 samples/sec Loss 13.4351 LearningRate 0.1702 Epoch: 1 Global Step: 9790 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:00:08,754-Speed 3491.96 samples/sec Loss 13.6411 LearningRate 0.1702 Epoch: 1 Global Step: 9800 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:00:11,688-Speed 3491.21 samples/sec Loss 13.3947 LearningRate 0.1702 Epoch: 1 Global Step: 9810 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:00:14,617-Speed 3497.47 samples/sec Loss 13.4461 LearningRate 0.1701 Epoch: 1 Global Step: 9820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:00:17,556-Speed 3483.94 samples/sec Loss 13.3674 LearningRate 0.1701 Epoch: 1 Global Step: 9830 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:00:20,496-Speed 3484.20 samples/sec Loss 13.4400 LearningRate 0.1701 Epoch: 1 Global Step: 9840 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:00:23,420-Speed 3503.07 samples/sec Loss 13.4692 LearningRate 0.1701 Epoch: 1 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:26,353-Speed 3493.15 samples/sec Loss 13.6762 LearningRate 0.1700 Epoch: 1 Global Step: 9860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:29,287-Speed 3490.12 samples/sec Loss 13.4546 LearningRate 0.1700 Epoch: 1 Global Step: 9870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:32,221-Speed 3492.01 samples/sec Loss 13.4324 LearningRate 0.1700 Epoch: 1 Global Step: 9880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:35,150-Speed 3496.81 samples/sec Loss 13.5038 LearningRate 0.1699 Epoch: 1 Global Step: 9890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:38,081-Speed 3495.12 samples/sec Loss 13.4071 LearningRate 0.1699 Epoch: 1 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:41,036-Speed 3466.23 samples/sec Loss 13.4834 LearningRate 0.1699 Epoch: 1 Global Step: 9910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:43,987-Speed 3471.12 samples/sec Loss 13.3773 LearningRate 0.1699 Epoch: 1 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:46,960-Speed 3445.78 samples/sec Loss 13.3338 LearningRate 0.1698 Epoch: 1 Global Step: 9930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:49,921-Speed 3459.09 samples/sec Loss 13.6277 LearningRate 0.1698 Epoch: 1 Global Step: 9940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:52,863-Speed 3480.51 samples/sec Loss 13.3754 LearningRate 0.1698 Epoch: 1 Global Step: 9950 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:00:55,786-Speed 3504.61 samples/sec Loss 13.3896 LearningRate 0.1697 Epoch: 1 Global Step: 9960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:00:58,721-Speed 3489.50 samples/sec Loss 13.2509 LearningRate 0.1697 Epoch: 1 Global Step: 9970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:01:01,653-Speed 3494.54 samples/sec Loss 13.4258 LearningRate 0.1697 Epoch: 1 Global Step: 9980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:01:04,584-Speed 3494.01 samples/sec Loss 13.6010 LearningRate 0.1696 Epoch: 1 Global Step: 9990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:01:07,519-Speed 3489.84 samples/sec Loss 13.3315 LearningRate 0.1696 Epoch: 1 Global Step: 10000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:01:50,443-[lfw][10000]XNorm: 22.673865 Training: 2022-01-19 18:01:50,444-[lfw][10000]Accuracy-Flip: 0.99333+-0.00357 Training: 2022-01-19 18:01:50,444-[lfw][10000]Accuracy-Highest: 0.99433 Training: 2022-01-19 18:02:40,281-[cfp_fp][10000]XNorm: 19.182854 Training: 2022-01-19 18:02:40,282-[cfp_fp][10000]Accuracy-Flip: 0.92843+-0.01145 Training: 2022-01-19 18:02:40,282-[cfp_fp][10000]Accuracy-Highest: 0.92843 Training: 2022-01-19 18:03:23,132-[agedb_30][10000]XNorm: 22.003015 Training: 2022-01-19 18:03:23,133-[agedb_30][10000]Accuracy-Flip: 0.95750+-0.00917 Training: 2022-01-19 18:03:23,134-[agedb_30][10000]Accuracy-Highest: 0.95750 Training: 2022-01-19 18:03:26,062-Speed 73.91 samples/sec Loss 13.2992 LearningRate 0.1696 Epoch: 1 Global Step: 10010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:03:28,981-Speed 3508.65 samples/sec Loss 13.3748 LearningRate 0.1696 Epoch: 1 Global Step: 10020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:03:31,914-Speed 3491.76 samples/sec Loss 13.3564 LearningRate 0.1695 Epoch: 1 Global Step: 10030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:03:34,842-Speed 3498.51 samples/sec Loss 13.5014 LearningRate 0.1695 Epoch: 1 Global Step: 10040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:03:37,765-Speed 3503.86 samples/sec Loss 13.4185 LearningRate 0.1695 Epoch: 1 Global Step: 10050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:03:40,700-Speed 3489.47 samples/sec Loss 13.2576 LearningRate 0.1694 Epoch: 1 Global Step: 10060 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:03:43,655-Speed 3466.78 samples/sec Loss 13.3821 LearningRate 0.1694 Epoch: 1 Global Step: 10070 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:03:46,616-Speed 3460.24 samples/sec Loss 13.3725 LearningRate 0.1694 Epoch: 1 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:03:49,545-Speed 3496.90 samples/sec Loss 13.3013 LearningRate 0.1694 Epoch: 1 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:03:52,487-Speed 3481.06 samples/sec Loss 13.4101 LearningRate 0.1693 Epoch: 1 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:03:55,488-Speed 3413.30 samples/sec Loss 13.1723 LearningRate 0.1693 Epoch: 1 Global Step: 10110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:09,183-Speed 747.79 samples/sec Loss 12.8873 LearningRate 0.1693 Epoch: 2 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:12,127-Speed 3478.90 samples/sec Loss 12.5168 LearningRate 0.1692 Epoch: 2 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:15,102-Speed 3443.35 samples/sec Loss 12.8642 LearningRate 0.1692 Epoch: 2 Global Step: 10140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:18,075-Speed 3445.66 samples/sec Loss 12.8724 LearningRate 0.1692 Epoch: 2 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:21,010-Speed 3489.66 samples/sec Loss 12.4969 LearningRate 0.1692 Epoch: 2 Global Step: 10160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:23,965-Speed 3466.71 samples/sec Loss 12.5972 LearningRate 0.1691 Epoch: 2 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:26,899-Speed 3491.10 samples/sec Loss 12.7424 LearningRate 0.1691 Epoch: 2 Global Step: 10180 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:04:29,852-Speed 3467.81 samples/sec Loss 12.8003 LearningRate 0.1691 Epoch: 2 Global Step: 10190 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:04:32,856-Speed 3409.91 samples/sec Loss 12.5903 LearningRate 0.1690 Epoch: 2 Global Step: 10200 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:04:35,830-Speed 3444.25 samples/sec Loss 12.8140 LearningRate 0.1690 Epoch: 2 Global Step: 10210 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:04:38,781-Speed 3469.87 samples/sec Loss 12.8785 LearningRate 0.1690 Epoch: 2 Global Step: 10220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:41,719-Speed 3486.77 samples/sec Loss 12.7269 LearningRate 0.1689 Epoch: 2 Global Step: 10230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:44,647-Speed 3498.75 samples/sec Loss 12.8560 LearningRate 0.1689 Epoch: 2 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:47,581-Speed 3491.20 samples/sec Loss 12.5643 LearningRate 0.1689 Epoch: 2 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:50,521-Speed 3483.30 samples/sec Loss 12.7530 LearningRate 0.1689 Epoch: 2 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:53,459-Speed 3487.10 samples/sec Loss 12.7863 LearningRate 0.1688 Epoch: 2 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:56,391-Speed 3493.96 samples/sec Loss 12.8521 LearningRate 0.1688 Epoch: 2 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:04:59,329-Speed 3486.04 samples/sec Loss 12.8809 LearningRate 0.1688 Epoch: 2 Global Step: 10290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:02,289-Speed 3459.51 samples/sec Loss 12.8930 LearningRate 0.1687 Epoch: 2 Global Step: 10300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:05,230-Speed 3483.13 samples/sec Loss 12.9103 LearningRate 0.1687 Epoch: 2 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:08,161-Speed 3493.68 samples/sec Loss 12.8933 LearningRate 0.1687 Epoch: 2 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:11,144-Speed 3434.25 samples/sec Loss 12.8699 LearningRate 0.1687 Epoch: 2 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:14,146-Speed 3413.06 samples/sec Loss 13.0169 LearningRate 0.1686 Epoch: 2 Global Step: 10340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:17,139-Speed 3421.79 samples/sec Loss 13.0889 LearningRate 0.1686 Epoch: 2 Global Step: 10350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:20,078-Speed 3485.62 samples/sec Loss 13.0384 LearningRate 0.1686 Epoch: 2 Global Step: 10360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:23,053-Speed 3442.74 samples/sec Loss 13.0534 LearningRate 0.1685 Epoch: 2 Global Step: 10370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:25,985-Speed 3494.18 samples/sec Loss 13.0348 LearningRate 0.1685 Epoch: 2 Global Step: 10380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:28,919-Speed 3490.31 samples/sec Loss 12.9330 LearningRate 0.1685 Epoch: 2 Global Step: 10390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:31,857-Speed 3486.02 samples/sec Loss 12.9934 LearningRate 0.1685 Epoch: 2 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:34,792-Speed 3490.17 samples/sec Loss 13.1516 LearningRate 0.1684 Epoch: 2 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:37,733-Speed 3482.65 samples/sec Loss 12.9533 LearningRate 0.1684 Epoch: 2 Global Step: 10420 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:05:40,677-Speed 3478.92 samples/sec Loss 13.1892 LearningRate 0.1684 Epoch: 2 Global Step: 10430 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:05:43,616-Speed 3485.98 samples/sec Loss 13.0882 LearningRate 0.1683 Epoch: 2 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:46,551-Speed 3490.11 samples/sec Loss 13.0001 LearningRate 0.1683 Epoch: 2 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:49,489-Speed 3485.31 samples/sec Loss 12.9910 LearningRate 0.1683 Epoch: 2 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:52,424-Speed 3491.05 samples/sec Loss 12.8605 LearningRate 0.1683 Epoch: 2 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:55,352-Speed 3497.52 samples/sec Loss 13.0149 LearningRate 0.1682 Epoch: 2 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:05:58,336-Speed 3432.61 samples/sec Loss 12.8920 LearningRate 0.1682 Epoch: 2 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:01,288-Speed 3469.97 samples/sec Loss 13.1022 LearningRate 0.1682 Epoch: 2 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:04,240-Speed 3469.54 samples/sec Loss 12.9658 LearningRate 0.1681 Epoch: 2 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:07,175-Speed 3490.11 samples/sec Loss 12.9800 LearningRate 0.1681 Epoch: 2 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:10,114-Speed 3484.67 samples/sec Loss 13.0635 LearningRate 0.1681 Epoch: 2 Global Step: 10530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:13,081-Speed 3452.01 samples/sec Loss 13.2553 LearningRate 0.1680 Epoch: 2 Global Step: 10540 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:06:16,022-Speed 3483.47 samples/sec Loss 13.1175 LearningRate 0.1680 Epoch: 2 Global Step: 10550 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:06:18,973-Speed 3470.32 samples/sec Loss 13.0705 LearningRate 0.1680 Epoch: 2 Global Step: 10560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:06:21,912-Speed 3486.12 samples/sec Loss 13.1489 LearningRate 0.1680 Epoch: 2 Global Step: 10570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:06:24,897-Speed 3431.31 samples/sec Loss 13.2395 LearningRate 0.1679 Epoch: 2 Global Step: 10580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:06:27,823-Speed 3500.35 samples/sec Loss 13.1620 LearningRate 0.1679 Epoch: 2 Global Step: 10590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:30,846-Speed 3388.73 samples/sec Loss 13.0650 LearningRate 0.1679 Epoch: 2 Global Step: 10600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:33,792-Speed 3476.18 samples/sec Loss 13.1113 LearningRate 0.1678 Epoch: 2 Global Step: 10610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:36,723-Speed 3495.47 samples/sec Loss 13.1792 LearningRate 0.1678 Epoch: 2 Global Step: 10620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:39,670-Speed 3474.67 samples/sec Loss 13.0787 LearningRate 0.1678 Epoch: 2 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:42,664-Speed 3420.95 samples/sec Loss 13.0536 LearningRate 0.1678 Epoch: 2 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:45,674-Speed 3402.57 samples/sec Loss 13.2888 LearningRate 0.1677 Epoch: 2 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:48,601-Speed 3499.80 samples/sec Loss 13.0687 LearningRate 0.1677 Epoch: 2 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:51,534-Speed 3492.91 samples/sec Loss 13.1089 LearningRate 0.1677 Epoch: 2 Global Step: 10670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:54,468-Speed 3489.73 samples/sec Loss 13.4034 LearningRate 0.1676 Epoch: 2 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:06:57,412-Speed 3479.44 samples/sec Loss 13.2703 LearningRate 0.1676 Epoch: 2 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:00,349-Speed 3487.51 samples/sec Loss 13.3808 LearningRate 0.1676 Epoch: 2 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:03,281-Speed 3494.33 samples/sec Loss 13.2013 LearningRate 0.1676 Epoch: 2 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:06,210-Speed 3496.82 samples/sec Loss 13.2032 LearningRate 0.1675 Epoch: 2 Global Step: 10720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:09,147-Speed 3486.94 samples/sec Loss 13.1701 LearningRate 0.1675 Epoch: 2 Global Step: 10730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:12,094-Speed 3475.70 samples/sec Loss 13.1401 LearningRate 0.1675 Epoch: 2 Global Step: 10740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:15,051-Speed 3463.38 samples/sec Loss 13.2254 LearningRate 0.1674 Epoch: 2 Global Step: 10750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:17,980-Speed 3497.25 samples/sec Loss 13.0282 LearningRate 0.1674 Epoch: 2 Global Step: 10760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:20,943-Speed 3457.68 samples/sec Loss 13.1808 LearningRate 0.1674 Epoch: 2 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:23,927-Speed 3431.71 samples/sec Loss 13.3148 LearningRate 0.1674 Epoch: 2 Global Step: 10780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:26,899-Speed 3447.48 samples/sec Loss 13.1618 LearningRate 0.1673 Epoch: 2 Global Step: 10790 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:07:29,847-Speed 3474.41 samples/sec Loss 12.9890 LearningRate 0.1673 Epoch: 2 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:32,777-Speed 3495.49 samples/sec Loss 13.0312 LearningRate 0.1673 Epoch: 2 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:35,701-Speed 3503.83 samples/sec Loss 13.2560 LearningRate 0.1672 Epoch: 2 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:38,628-Speed 3498.93 samples/sec Loss 13.2034 LearningRate 0.1672 Epoch: 2 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:41,563-Speed 3489.39 samples/sec Loss 13.0617 LearningRate 0.1672 Epoch: 2 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:44,549-Speed 3429.90 samples/sec Loss 13.1911 LearningRate 0.1672 Epoch: 2 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:47,495-Speed 3476.90 samples/sec Loss 13.1679 LearningRate 0.1671 Epoch: 2 Global Step: 10860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:50,449-Speed 3467.22 samples/sec Loss 12.9666 LearningRate 0.1671 Epoch: 2 Global Step: 10870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:53,384-Speed 3490.08 samples/sec Loss 12.9759 LearningRate 0.1671 Epoch: 2 Global Step: 10880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:56,327-Speed 3480.50 samples/sec Loss 12.9945 LearningRate 0.1670 Epoch: 2 Global Step: 10890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:07:59,256-Speed 3496.98 samples/sec Loss 13.0662 LearningRate 0.1670 Epoch: 2 Global Step: 10900 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:08:02,186-Speed 3496.46 samples/sec Loss 13.2451 LearningRate 0.1670 Epoch: 2 Global Step: 10910 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:08:05,109-Speed 3503.10 samples/sec Loss 13.0005 LearningRate 0.1669 Epoch: 2 Global Step: 10920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:08,037-Speed 3499.26 samples/sec Loss 13.0321 LearningRate 0.1669 Epoch: 2 Global Step: 10930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:10,966-Speed 3496.65 samples/sec Loss 13.1918 LearningRate 0.1669 Epoch: 2 Global Step: 10940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:13,894-Speed 3497.28 samples/sec Loss 12.9267 LearningRate 0.1669 Epoch: 2 Global Step: 10950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:16,860-Speed 3453.54 samples/sec Loss 13.1346 LearningRate 0.1668 Epoch: 2 Global Step: 10960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:19,815-Speed 3466.02 samples/sec Loss 13.0404 LearningRate 0.1668 Epoch: 2 Global Step: 10970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:22,794-Speed 3439.40 samples/sec Loss 13.2165 LearningRate 0.1668 Epoch: 2 Global Step: 10980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:25,755-Speed 3458.79 samples/sec Loss 13.2156 LearningRate 0.1667 Epoch: 2 Global Step: 10990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:28,682-Speed 3499.92 samples/sec Loss 13.0977 LearningRate 0.1667 Epoch: 2 Global Step: 11000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:31,614-Speed 3493.52 samples/sec Loss 13.1103 LearningRate 0.1667 Epoch: 2 Global Step: 11010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:34,548-Speed 3491.11 samples/sec Loss 13.1377 LearningRate 0.1667 Epoch: 2 Global Step: 11020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:08:37,482-Speed 3490.51 samples/sec Loss 13.1811 LearningRate 0.1666 Epoch: 2 Global Step: 11030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:08:40,411-Speed 3497.18 samples/sec Loss 13.0388 LearningRate 0.1666 Epoch: 2 Global Step: 11040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:08:43,351-Speed 3483.58 samples/sec Loss 13.2101 LearningRate 0.1666 Epoch: 2 Global Step: 11050 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:08:46,271-Speed 3508.04 samples/sec Loss 12.9699 LearningRate 0.1665 Epoch: 2 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:49,198-Speed 3498.59 samples/sec Loss 12.9450 LearningRate 0.1665 Epoch: 2 Global Step: 11070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:52,166-Speed 3451.98 samples/sec Loss 13.2868 LearningRate 0.1665 Epoch: 2 Global Step: 11080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:55,098-Speed 3492.72 samples/sec Loss 13.0954 LearningRate 0.1665 Epoch: 2 Global Step: 11090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:08:58,034-Speed 3489.35 samples/sec Loss 13.0192 LearningRate 0.1664 Epoch: 2 Global Step: 11100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:09:00,965-Speed 3493.76 samples/sec Loss 13.0716 LearningRate 0.1664 Epoch: 2 Global Step: 11110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:09:03,880-Speed 3513.64 samples/sec Loss 13.2133 LearningRate 0.1664 Epoch: 2 Global Step: 11120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:09:06,808-Speed 3499.55 samples/sec Loss 13.1478 LearningRate 0.1663 Epoch: 2 Global Step: 11130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:09:09,735-Speed 3499.23 samples/sec Loss 13.0336 LearningRate 0.1663 Epoch: 2 Global Step: 11140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:09:12,667-Speed 3493.86 samples/sec Loss 13.0589 LearningRate 0.1663 Epoch: 2 Global Step: 11150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:09:15,603-Speed 3488.23 samples/sec Loss 13.0868 LearningRate 0.1663 Epoch: 2 Global Step: 11160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:09:18,540-Speed 3486.95 samples/sec Loss 13.1301 LearningRate 0.1662 Epoch: 2 Global Step: 11170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:09:21,514-Speed 3445.08 samples/sec Loss 13.1866 LearningRate 0.1662 Epoch: 2 Global Step: 11180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:09:24,540-Speed 3384.79 samples/sec Loss 13.0925 LearningRate 0.1662 Epoch: 2 Global Step: 11190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:09:27,529-Speed 3426.33 samples/sec Loss 13.0760 LearningRate 0.1661 Epoch: 2 Global Step: 11200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:09:30,539-Speed 3402.73 samples/sec Loss 13.1680 LearningRate 0.1661 Epoch: 2 Global Step: 11210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:09:33,498-Speed 3462.63 samples/sec Loss 13.0604 LearningRate 0.1661 Epoch: 2 Global Step: 11220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:09:36,433-Speed 3488.96 samples/sec Loss 13.0961 LearningRate 0.1661 Epoch: 2 Global Step: 11230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:09:39,360-Speed 3499.03 samples/sec Loss 12.9899 LearningRate 0.1660 Epoch: 2 Global Step: 11240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:09:42,293-Speed 3492.75 samples/sec Loss 13.2648 LearningRate 0.1660 Epoch: 2 Global Step: 11250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:09:45,231-Speed 3485.96 samples/sec Loss 13.1945 LearningRate 0.1660 Epoch: 2 Global Step: 11260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:09:48,160-Speed 3497.28 samples/sec Loss 13.2100 LearningRate 0.1659 Epoch: 2 Global Step: 11270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:09:51,092-Speed 3492.91 samples/sec Loss 13.0125 LearningRate 0.1659 Epoch: 2 Global Step: 11280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:09:54,020-Speed 3498.51 samples/sec Loss 13.0795 LearningRate 0.1659 Epoch: 2 Global Step: 11290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:09:56,959-Speed 3485.76 samples/sec Loss 13.0316 LearningRate 0.1659 Epoch: 2 Global Step: 11300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:09:59,926-Speed 3451.41 samples/sec Loss 12.9450 LearningRate 0.1658 Epoch: 2 Global Step: 11310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:02,911-Speed 3431.75 samples/sec Loss 13.0631 LearningRate 0.1658 Epoch: 2 Global Step: 11320 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:10:05,893-Speed 3434.11 samples/sec Loss 12.9768 LearningRate 0.1658 Epoch: 2 Global Step: 11330 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:10:08,820-Speed 3499.22 samples/sec Loss 13.1579 LearningRate 0.1657 Epoch: 2 Global Step: 11340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:11,786-Speed 3454.02 samples/sec Loss 13.0017 LearningRate 0.1657 Epoch: 2 Global Step: 11350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:14,746-Speed 3460.34 samples/sec Loss 12.9459 LearningRate 0.1657 Epoch: 2 Global Step: 11360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:17,724-Speed 3439.50 samples/sec Loss 12.9547 LearningRate 0.1657 Epoch: 2 Global Step: 11370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:20,730-Speed 3407.55 samples/sec Loss 12.8923 LearningRate 0.1656 Epoch: 2 Global Step: 11380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:23,661-Speed 3494.28 samples/sec Loss 13.0893 LearningRate 0.1656 Epoch: 2 Global Step: 11390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:26,594-Speed 3492.66 samples/sec Loss 13.0899 LearningRate 0.1656 Epoch: 2 Global Step: 11400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:29,532-Speed 3486.19 samples/sec Loss 13.1161 LearningRate 0.1655 Epoch: 2 Global Step: 11410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:32,495-Speed 3456.17 samples/sec Loss 13.1643 LearningRate 0.1655 Epoch: 2 Global Step: 11420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:35,466-Speed 3447.78 samples/sec Loss 13.0082 LearningRate 0.1655 Epoch: 2 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:38,398-Speed 3493.67 samples/sec Loss 12.8798 LearningRate 0.1654 Epoch: 2 Global Step: 11440 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:10:41,360-Speed 3459.16 samples/sec Loss 13.2423 LearningRate 0.1654 Epoch: 2 Global Step: 11450 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:10:44,306-Speed 3475.78 samples/sec Loss 13.1061 LearningRate 0.1654 Epoch: 2 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:47,248-Speed 3481.74 samples/sec Loss 13.0022 LearningRate 0.1654 Epoch: 2 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:50,247-Speed 3415.09 samples/sec Loss 13.1853 LearningRate 0.1653 Epoch: 2 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:53,223-Speed 3441.24 samples/sec Loss 13.1938 LearningRate 0.1653 Epoch: 2 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:56,162-Speed 3486.05 samples/sec Loss 13.0709 LearningRate 0.1653 Epoch: 2 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:10:59,092-Speed 3495.81 samples/sec Loss 13.2245 LearningRate 0.1652 Epoch: 2 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:02,022-Speed 3496.04 samples/sec Loss 13.1205 LearningRate 0.1652 Epoch: 2 Global Step: 11520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:04,947-Speed 3500.70 samples/sec Loss 12.8917 LearningRate 0.1652 Epoch: 2 Global Step: 11530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:07,898-Speed 3471.21 samples/sec Loss 12.9878 LearningRate 0.1652 Epoch: 2 Global Step: 11540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:10,826-Speed 3498.53 samples/sec Loss 12.9955 LearningRate 0.1651 Epoch: 2 Global Step: 11550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:13,769-Speed 3480.31 samples/sec Loss 12.8660 LearningRate 0.1651 Epoch: 2 Global Step: 11560 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:11:16,763-Speed 3421.76 samples/sec Loss 12.9581 LearningRate 0.1651 Epoch: 2 Global Step: 11570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:19,747-Speed 3432.24 samples/sec Loss 12.8806 LearningRate 0.1650 Epoch: 2 Global Step: 11580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:22,764-Speed 3394.89 samples/sec Loss 12.9043 LearningRate 0.1650 Epoch: 2 Global Step: 11590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:25,790-Speed 3385.07 samples/sec Loss 13.1296 LearningRate 0.1650 Epoch: 2 Global Step: 11600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:28,737-Speed 3475.93 samples/sec Loss 13.1326 LearningRate 0.1650 Epoch: 2 Global Step: 11610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:31,706-Speed 3449.80 samples/sec Loss 12.9564 LearningRate 0.1649 Epoch: 2 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:34,670-Speed 3455.07 samples/sec Loss 13.0136 LearningRate 0.1649 Epoch: 2 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:37,618-Speed 3475.03 samples/sec Loss 12.9186 LearningRate 0.1649 Epoch: 2 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:40,577-Speed 3462.24 samples/sec Loss 12.9860 LearningRate 0.1648 Epoch: 2 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:43,521-Speed 3478.26 samples/sec Loss 13.0206 LearningRate 0.1648 Epoch: 2 Global Step: 11660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:46,467-Speed 3477.10 samples/sec Loss 12.8516 LearningRate 0.1648 Epoch: 2 Global Step: 11670 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:11:49,430-Speed 3457.19 samples/sec Loss 12.8505 LearningRate 0.1648 Epoch: 2 Global Step: 11680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:52,463-Speed 3377.00 samples/sec Loss 13.0512 LearningRate 0.1647 Epoch: 2 Global Step: 11690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:55,436-Speed 3444.37 samples/sec Loss 13.0624 LearningRate 0.1647 Epoch: 2 Global Step: 11700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:11:58,381-Speed 3478.00 samples/sec Loss 13.0629 LearningRate 0.1647 Epoch: 2 Global Step: 11710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:01,312-Speed 3494.88 samples/sec Loss 13.0451 LearningRate 0.1646 Epoch: 2 Global Step: 11720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:04,252-Speed 3484.01 samples/sec Loss 12.9538 LearningRate 0.1646 Epoch: 2 Global Step: 11730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:07,205-Speed 3469.17 samples/sec Loss 12.8411 LearningRate 0.1646 Epoch: 2 Global Step: 11740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:10,135-Speed 3495.94 samples/sec Loss 13.0264 LearningRate 0.1646 Epoch: 2 Global Step: 11750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:13,069-Speed 3490.93 samples/sec Loss 12.9326 LearningRate 0.1645 Epoch: 2 Global Step: 11760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:16,025-Speed 3464.68 samples/sec Loss 12.8536 LearningRate 0.1645 Epoch: 2 Global Step: 11770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:18,959-Speed 3490.89 samples/sec Loss 12.7819 LearningRate 0.1645 Epoch: 2 Global Step: 11780 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:12:21,932-Speed 3445.06 samples/sec Loss 13.0092 LearningRate 0.1644 Epoch: 2 Global Step: 11790 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:12:24,892-Speed 3460.25 samples/sec Loss 12.8432 LearningRate 0.1644 Epoch: 2 Global Step: 11800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:27,841-Speed 3473.66 samples/sec Loss 13.0560 LearningRate 0.1644 Epoch: 2 Global Step: 11810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:30,780-Speed 3485.35 samples/sec Loss 13.0905 LearningRate 0.1644 Epoch: 2 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:33,717-Speed 3487.89 samples/sec Loss 13.0776 LearningRate 0.1643 Epoch: 2 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:36,653-Speed 3488.44 samples/sec Loss 12.9146 LearningRate 0.1643 Epoch: 2 Global Step: 11840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:39,625-Speed 3447.21 samples/sec Loss 13.0552 LearningRate 0.1643 Epoch: 2 Global Step: 11850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:42,708-Speed 3322.17 samples/sec Loss 12.9437 LearningRate 0.1642 Epoch: 2 Global Step: 11860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:45,663-Speed 3466.39 samples/sec Loss 12.9747 LearningRate 0.1642 Epoch: 2 Global Step: 11870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:48,600-Speed 3487.30 samples/sec Loss 12.8475 LearningRate 0.1642 Epoch: 2 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:51,568-Speed 3450.23 samples/sec Loss 12.9290 LearningRate 0.1642 Epoch: 2 Global Step: 11890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:12:54,504-Speed 3488.22 samples/sec Loss 13.1261 LearningRate 0.1641 Epoch: 2 Global Step: 11900 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:12:57,447-Speed 3480.44 samples/sec Loss 12.8888 LearningRate 0.1641 Epoch: 2 Global Step: 11910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:13:00,406-Speed 3462.85 samples/sec Loss 13.0637 LearningRate 0.1641 Epoch: 2 Global Step: 11920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:13:03,353-Speed 3474.81 samples/sec Loss 12.9766 LearningRate 0.1640 Epoch: 2 Global Step: 11930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:13:06,286-Speed 3492.89 samples/sec Loss 12.8140 LearningRate 0.1640 Epoch: 2 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:13:09,219-Speed 3491.82 samples/sec Loss 12.8688 LearningRate 0.1640 Epoch: 2 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:13:12,204-Speed 3431.04 samples/sec Loss 12.7540 LearningRate 0.1640 Epoch: 2 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:13:15,137-Speed 3492.41 samples/sec Loss 13.0701 LearningRate 0.1639 Epoch: 2 Global Step: 11970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:13:18,099-Speed 3458.49 samples/sec Loss 12.8429 LearningRate 0.1639 Epoch: 2 Global Step: 11980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:13:21,039-Speed 3483.66 samples/sec Loss 12.9583 LearningRate 0.1639 Epoch: 2 Global Step: 11990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:13:23,985-Speed 3476.12 samples/sec Loss 12.9960 LearningRate 0.1638 Epoch: 2 Global Step: 12000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:14:06,678-[lfw][12000]XNorm: 21.435903 Training: 2022-01-19 18:14:06,678-[lfw][12000]Accuracy-Flip: 0.99467+-0.00427 Training: 2022-01-19 18:14:06,679-[lfw][12000]Accuracy-Highest: 0.99467 Training: 2022-01-19 18:14:56,342-[cfp_fp][12000]XNorm: 18.684349 Training: 2022-01-19 18:14:56,343-[cfp_fp][12000]Accuracy-Flip: 0.92843+-0.01363 Training: 2022-01-19 18:14:56,344-[cfp_fp][12000]Accuracy-Highest: 0.92843 Training: 2022-01-19 18:15:39,160-[agedb_30][12000]XNorm: 20.534572 Training: 2022-01-19 18:15:39,161-[agedb_30][12000]Accuracy-Flip: 0.96017+-0.01257 Training: 2022-01-19 18:15:39,161-[agedb_30][12000]Accuracy-Highest: 0.96017 Training: 2022-01-19 18:15:42,090-Speed 74.15 samples/sec Loss 12.7153 LearningRate 0.1638 Epoch: 2 Global Step: 12010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:15:45,012-Speed 3504.77 samples/sec Loss 12.8828 LearningRate 0.1638 Epoch: 2 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:15:47,939-Speed 3499.74 samples/sec Loss 12.9693 LearningRate 0.1638 Epoch: 2 Global Step: 12030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:15:50,892-Speed 3469.31 samples/sec Loss 12.9743 LearningRate 0.1637 Epoch: 2 Global Step: 12040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:15:53,815-Speed 3503.24 samples/sec Loss 12.8667 LearningRate 0.1637 Epoch: 2 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:15:56,746-Speed 3494.60 samples/sec Loss 12.8397 LearningRate 0.1637 Epoch: 2 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:15:59,728-Speed 3435.47 samples/sec Loss 13.0780 LearningRate 0.1636 Epoch: 2 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:02,719-Speed 3424.31 samples/sec Loss 12.9817 LearningRate 0.1636 Epoch: 2 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:05,649-Speed 3496.04 samples/sec Loss 13.0334 LearningRate 0.1636 Epoch: 2 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:08,631-Speed 3434.80 samples/sec Loss 12.9764 LearningRate 0.1636 Epoch: 2 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:11,595-Speed 3455.03 samples/sec Loss 12.9162 LearningRate 0.1635 Epoch: 2 Global Step: 12110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:14,581-Speed 3430.10 samples/sec Loss 12.8775 LearningRate 0.1635 Epoch: 2 Global Step: 12120 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:16:17,564-Speed 3433.70 samples/sec Loss 13.0525 LearningRate 0.1635 Epoch: 2 Global Step: 12130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:16:20,541-Speed 3440.46 samples/sec Loss 12.8736 LearningRate 0.1634 Epoch: 2 Global Step: 12140 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:16:23,538-Speed 3417.22 samples/sec Loss 12.8899 LearningRate 0.1634 Epoch: 2 Global Step: 12150 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:16:26,491-Speed 3469.88 samples/sec Loss 12.8234 LearningRate 0.1634 Epoch: 2 Global Step: 12160 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:16:29,426-Speed 3490.20 samples/sec Loss 12.8957 LearningRate 0.1634 Epoch: 2 Global Step: 12170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:32,362-Speed 3488.44 samples/sec Loss 12.9537 LearningRate 0.1633 Epoch: 2 Global Step: 12180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:35,303-Speed 3482.00 samples/sec Loss 12.9766 LearningRate 0.1633 Epoch: 2 Global Step: 12190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:38,243-Speed 3483.92 samples/sec Loss 12.9090 LearningRate 0.1633 Epoch: 2 Global Step: 12200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:41,194-Speed 3470.91 samples/sec Loss 12.9358 LearningRate 0.1632 Epoch: 2 Global Step: 12210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:44,206-Speed 3401.12 samples/sec Loss 12.7826 LearningRate 0.1632 Epoch: 2 Global Step: 12220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:47,201-Speed 3420.05 samples/sec Loss 13.0075 LearningRate 0.1632 Epoch: 2 Global Step: 12230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:50,166-Speed 3454.61 samples/sec Loss 13.0761 LearningRate 0.1632 Epoch: 2 Global Step: 12240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:53,108-Speed 3481.56 samples/sec Loss 12.9993 LearningRate 0.1631 Epoch: 2 Global Step: 12250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:56,067-Speed 3463.04 samples/sec Loss 12.9467 LearningRate 0.1631 Epoch: 2 Global Step: 12260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:16:59,002-Speed 3488.94 samples/sec Loss 13.0049 LearningRate 0.1631 Epoch: 2 Global Step: 12270 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:17:01,941-Speed 3485.04 samples/sec Loss 12.7698 LearningRate 0.1630 Epoch: 2 Global Step: 12280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:17:04,965-Speed 3387.73 samples/sec Loss 12.8351 LearningRate 0.1630 Epoch: 2 Global Step: 12290 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:17:07,965-Speed 3413.96 samples/sec Loss 12.8383 LearningRate 0.1630 Epoch: 2 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:17:10,973-Speed 3404.89 samples/sec Loss 12.9476 LearningRate 0.1630 Epoch: 2 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:17:13,908-Speed 3490.66 samples/sec Loss 13.0383 LearningRate 0.1629 Epoch: 2 Global Step: 12320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:17:16,977-Speed 3336.99 samples/sec Loss 12.8496 LearningRate 0.1629 Epoch: 2 Global Step: 12330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:17:19,980-Speed 3411.17 samples/sec Loss 12.8294 LearningRate 0.1629 Epoch: 2 Global Step: 12340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:17:22,921-Speed 3483.18 samples/sec Loss 12.6431 LearningRate 0.1628 Epoch: 2 Global Step: 12350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:17:25,892-Speed 3447.97 samples/sec Loss 12.7614 LearningRate 0.1628 Epoch: 2 Global Step: 12360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:17:28,824-Speed 3492.48 samples/sec Loss 12.7714 LearningRate 0.1628 Epoch: 2 Global Step: 12370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:17:31,752-Speed 3497.79 samples/sec Loss 12.9143 LearningRate 0.1628 Epoch: 2 Global Step: 12380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:17:34,684-Speed 3493.96 samples/sec Loss 12.7056 LearningRate 0.1627 Epoch: 2 Global Step: 12390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:17:37,638-Speed 3467.58 samples/sec Loss 12.8426 LearningRate 0.1627 Epoch: 2 Global Step: 12400 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-19 18:17:40,560-Speed 3504.94 samples/sec Loss 12.8434 LearningRate 0.1627 Epoch: 2 Global Step: 12410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-19 18:17:43,521-Speed 3459.63 samples/sec Loss 12.9116 LearningRate 0.1626 Epoch: 2 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:17:46,461-Speed 3484.10 samples/sec Loss 12.8269 LearningRate 0.1626 Epoch: 2 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:17:49,431-Speed 3447.51 samples/sec Loss 12.8967 LearningRate 0.1626 Epoch: 2 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:17:52,363-Speed 3494.53 samples/sec Loss 12.8860 LearningRate 0.1626 Epoch: 2 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:17:55,291-Speed 3498.64 samples/sec Loss 12.6213 LearningRate 0.1625 Epoch: 2 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:17:58,227-Speed 3488.11 samples/sec Loss 12.6898 LearningRate 0.1625 Epoch: 2 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:01,160-Speed 3492.18 samples/sec Loss 12.8462 LearningRate 0.1625 Epoch: 2 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:04,133-Speed 3444.63 samples/sec Loss 12.7245 LearningRate 0.1624 Epoch: 2 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:07,063-Speed 3496.29 samples/sec Loss 12.7575 LearningRate 0.1624 Epoch: 2 Global Step: 12500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:09,997-Speed 3490.57 samples/sec Loss 12.7045 LearningRate 0.1624 Epoch: 2 Global Step: 12510 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:18:12,937-Speed 3483.84 samples/sec Loss 12.7153 LearningRate 0.1624 Epoch: 2 Global Step: 12520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:15,893-Speed 3465.04 samples/sec Loss 13.0160 LearningRate 0.1623 Epoch: 2 Global Step: 12530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:18,836-Speed 3480.86 samples/sec Loss 12.8329 LearningRate 0.1623 Epoch: 2 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:21,765-Speed 3497.26 samples/sec Loss 12.7802 LearningRate 0.1623 Epoch: 2 Global Step: 12550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:24,805-Speed 3369.37 samples/sec Loss 12.9902 LearningRate 0.1622 Epoch: 2 Global Step: 12560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:27,743-Speed 3486.07 samples/sec Loss 12.7940 LearningRate 0.1622 Epoch: 2 Global Step: 12570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:30,783-Speed 3369.40 samples/sec Loss 12.6768 LearningRate 0.1622 Epoch: 2 Global Step: 12580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:33,772-Speed 3425.86 samples/sec Loss 12.6840 LearningRate 0.1622 Epoch: 2 Global Step: 12590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:36,706-Speed 3491.53 samples/sec Loss 12.7984 LearningRate 0.1621 Epoch: 2 Global Step: 12600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:39,672-Speed 3454.20 samples/sec Loss 12.5499 LearningRate 0.1621 Epoch: 2 Global Step: 12610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:18:42,616-Speed 3478.14 samples/sec Loss 12.6036 LearningRate 0.1621 Epoch: 2 Global Step: 12620 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:18:45,546-Speed 3496.48 samples/sec Loss 12.7831 LearningRate 0.1620 Epoch: 2 Global Step: 12630 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:18:48,502-Speed 3465.66 samples/sec Loss 12.7641 LearningRate 0.1620 Epoch: 2 Global Step: 12640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:18:51,436-Speed 3490.19 samples/sec Loss 12.9042 LearningRate 0.1620 Epoch: 2 Global Step: 12650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:18:54,371-Speed 3490.76 samples/sec Loss 12.7121 LearningRate 0.1620 Epoch: 2 Global Step: 12660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:18:57,288-Speed 3510.73 samples/sec Loss 12.6469 LearningRate 0.1619 Epoch: 2 Global Step: 12670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:00,289-Speed 3412.50 samples/sec Loss 12.8347 LearningRate 0.1619 Epoch: 2 Global Step: 12680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:03,218-Speed 3497.54 samples/sec Loss 12.7335 LearningRate 0.1619 Epoch: 2 Global Step: 12690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:06,149-Speed 3493.92 samples/sec Loss 12.8281 LearningRate 0.1618 Epoch: 2 Global Step: 12700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:09,079-Speed 3496.55 samples/sec Loss 12.8116 LearningRate 0.1618 Epoch: 2 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:12,010-Speed 3494.67 samples/sec Loss 12.7364 LearningRate 0.1618 Epoch: 2 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:14,966-Speed 3464.35 samples/sec Loss 12.6615 LearningRate 0.1618 Epoch: 2 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:17,904-Speed 3487.06 samples/sec Loss 12.5974 LearningRate 0.1617 Epoch: 2 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:20,831-Speed 3499.50 samples/sec Loss 12.6862 LearningRate 0.1617 Epoch: 2 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:23,755-Speed 3503.09 samples/sec Loss 12.8966 LearningRate 0.1617 Epoch: 2 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:26,679-Speed 3503.77 samples/sec Loss 12.8662 LearningRate 0.1616 Epoch: 2 Global Step: 12770 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:19:29,620-Speed 3482.75 samples/sec Loss 12.7505 LearningRate 0.1616 Epoch: 2 Global Step: 12780 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:19:32,559-Speed 3484.58 samples/sec Loss 12.6556 LearningRate 0.1616 Epoch: 2 Global Step: 12790 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:19:35,525-Speed 3453.66 samples/sec Loss 12.6891 LearningRate 0.1616 Epoch: 2 Global Step: 12800 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:19:38,457-Speed 3493.38 samples/sec Loss 12.7795 LearningRate 0.1615 Epoch: 2 Global Step: 12810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:41,436-Speed 3438.58 samples/sec Loss 12.6423 LearningRate 0.1615 Epoch: 2 Global Step: 12820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:44,377-Speed 3482.04 samples/sec Loss 12.7711 LearningRate 0.1615 Epoch: 2 Global Step: 12830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:47,348-Speed 3448.35 samples/sec Loss 12.8426 LearningRate 0.1614 Epoch: 2 Global Step: 12840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:50,280-Speed 3493.73 samples/sec Loss 12.8173 LearningRate 0.1614 Epoch: 2 Global Step: 12850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:53,213-Speed 3492.33 samples/sec Loss 12.7873 LearningRate 0.1614 Epoch: 2 Global Step: 12860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:56,146-Speed 3491.60 samples/sec Loss 12.6440 LearningRate 0.1614 Epoch: 2 Global Step: 12870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:19:59,076-Speed 3495.67 samples/sec Loss 12.6879 LearningRate 0.1613 Epoch: 2 Global Step: 12880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:20:02,031-Speed 3466.61 samples/sec Loss 12.7893 LearningRate 0.1613 Epoch: 2 Global Step: 12890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:20:04,964-Speed 3491.71 samples/sec Loss 12.4648 LearningRate 0.1613 Epoch: 2 Global Step: 12900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:20:07,935-Speed 3448.37 samples/sec Loss 12.8134 LearningRate 0.1612 Epoch: 2 Global Step: 12910 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:20:10,898-Speed 3457.00 samples/sec Loss 12.7856 LearningRate 0.1612 Epoch: 2 Global Step: 12920 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:20:13,825-Speed 3498.71 samples/sec Loss 12.7706 LearningRate 0.1612 Epoch: 2 Global Step: 12930 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:20:16,771-Speed 3477.07 samples/sec Loss 12.7578 LearningRate 0.1612 Epoch: 2 Global Step: 12940 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:20:19,710-Speed 3484.60 samples/sec Loss 12.7515 LearningRate 0.1611 Epoch: 2 Global Step: 12950 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:20:22,645-Speed 3490.57 samples/sec Loss 12.6185 LearningRate 0.1611 Epoch: 2 Global Step: 12960 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:20:25,599-Speed 3466.93 samples/sec Loss 12.7092 LearningRate 0.1611 Epoch: 2 Global Step: 12970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:20:28,531-Speed 3493.37 samples/sec Loss 12.5607 LearningRate 0.1610 Epoch: 2 Global Step: 12980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:20:31,487-Speed 3465.85 samples/sec Loss 12.9288 LearningRate 0.1610 Epoch: 2 Global Step: 12990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:20:34,445-Speed 3462.45 samples/sec Loss 12.5432 LearningRate 0.1610 Epoch: 2 Global Step: 13000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:20:37,423-Speed 3439.63 samples/sec Loss 12.7212 LearningRate 0.1610 Epoch: 2 Global Step: 13010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:20:40,408-Speed 3431.36 samples/sec Loss 12.8131 LearningRate 0.1609 Epoch: 2 Global Step: 13020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:20:43,402-Speed 3420.89 samples/sec Loss 12.7745 LearningRate 0.1609 Epoch: 2 Global Step: 13030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:20:46,470-Speed 3403.41 samples/sec Loss 12.5892 LearningRate 0.1609 Epoch: 2 Global Step: 13040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:20:49,441-Speed 3447.32 samples/sec Loss 12.6457 LearningRate 0.1608 Epoch: 2 Global Step: 13050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:20:52,455-Speed 3465.76 samples/sec Loss 12.8462 LearningRate 0.1608 Epoch: 2 Global Step: 13060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:20:55,385-Speed 3495.13 samples/sec Loss 12.5975 LearningRate 0.1608 Epoch: 2 Global Step: 13070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:20:58,320-Speed 3490.18 samples/sec Loss 12.8044 LearningRate 0.1608 Epoch: 2 Global Step: 13080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:21:01,651-Speed 3494.85 samples/sec Loss 12.7082 LearningRate 0.1607 Epoch: 2 Global Step: 13090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:21:04,592-Speed 3482.63 samples/sec Loss 12.6474 LearningRate 0.1607 Epoch: 2 Global Step: 13100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:07,528-Speed 3489.05 samples/sec Loss 12.7292 LearningRate 0.1607 Epoch: 2 Global Step: 13110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:10,506-Speed 3439.94 samples/sec Loss 12.8095 LearningRate 0.1607 Epoch: 2 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:13,450-Speed 3478.98 samples/sec Loss 12.6576 LearningRate 0.1606 Epoch: 2 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:16,449-Speed 3414.92 samples/sec Loss 12.5498 LearningRate 0.1606 Epoch: 2 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:19,386-Speed 3488.17 samples/sec Loss 12.5853 LearningRate 0.1606 Epoch: 2 Global Step: 13150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:22,317-Speed 3494.31 samples/sec Loss 12.6123 LearningRate 0.1605 Epoch: 2 Global Step: 13160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:25,268-Speed 3470.78 samples/sec Loss 12.6761 LearningRate 0.1605 Epoch: 2 Global Step: 13170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:28,206-Speed 3486.46 samples/sec Loss 12.5716 LearningRate 0.1605 Epoch: 2 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:31,150-Speed 3479.80 samples/sec Loss 12.6562 LearningRate 0.1605 Epoch: 2 Global Step: 13190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:34,086-Speed 3487.87 samples/sec Loss 12.8121 LearningRate 0.1604 Epoch: 2 Global Step: 13200 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:21:37,008-Speed 3505.58 samples/sec Loss 12.8694 LearningRate 0.1604 Epoch: 2 Global Step: 13210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:39,938-Speed 3495.75 samples/sec Loss 12.5704 LearningRate 0.1604 Epoch: 2 Global Step: 13220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:42,872-Speed 3491.03 samples/sec Loss 12.5308 LearningRate 0.1603 Epoch: 2 Global Step: 13230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:45,807-Speed 3489.78 samples/sec Loss 12.7077 LearningRate 0.1603 Epoch: 2 Global Step: 13240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:48,740-Speed 3492.13 samples/sec Loss 12.7831 LearningRate 0.1603 Epoch: 2 Global Step: 13250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:51,679-Speed 3485.48 samples/sec Loss 12.6974 LearningRate 0.1603 Epoch: 2 Global Step: 13260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:54,616-Speed 3487.46 samples/sec Loss 12.7995 LearningRate 0.1602 Epoch: 2 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:21:57,573-Speed 3464.03 samples/sec Loss 12.6406 LearningRate 0.1602 Epoch: 2 Global Step: 13280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:00,583-Speed 3403.51 samples/sec Loss 12.5479 LearningRate 0.1602 Epoch: 2 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:03,577-Speed 3421.39 samples/sec Loss 12.6730 LearningRate 0.1601 Epoch: 2 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:06,508-Speed 3494.35 samples/sec Loss 12.7354 LearningRate 0.1601 Epoch: 2 Global Step: 13310 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:22:09,446-Speed 3485.31 samples/sec Loss 12.5723 LearningRate 0.1601 Epoch: 2 Global Step: 13320 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:22:12,382-Speed 3489.46 samples/sec Loss 12.7002 LearningRate 0.1601 Epoch: 2 Global Step: 13330 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:22:15,361-Speed 3438.32 samples/sec Loss 12.6933 LearningRate 0.1600 Epoch: 2 Global Step: 13340 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:22:18,337-Speed 3442.12 samples/sec Loss 12.7985 LearningRate 0.1600 Epoch: 2 Global Step: 13350 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:22:21,274-Speed 3487.11 samples/sec Loss 12.6270 LearningRate 0.1600 Epoch: 2 Global Step: 13360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:24,208-Speed 3490.76 samples/sec Loss 12.5464 LearningRate 0.1599 Epoch: 2 Global Step: 13370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:27,145-Speed 3488.29 samples/sec Loss 12.6019 LearningRate 0.1599 Epoch: 2 Global Step: 13380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:30,083-Speed 3486.47 samples/sec Loss 12.5231 LearningRate 0.1599 Epoch: 2 Global Step: 13390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:33,016-Speed 3492.35 samples/sec Loss 12.6130 LearningRate 0.1599 Epoch: 2 Global Step: 13400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:35,972-Speed 3465.38 samples/sec Loss 12.4731 LearningRate 0.1598 Epoch: 2 Global Step: 13410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:38,902-Speed 3495.35 samples/sec Loss 12.6900 LearningRate 0.1598 Epoch: 2 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:41,834-Speed 3492.59 samples/sec Loss 12.6854 LearningRate 0.1598 Epoch: 2 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:44,769-Speed 3489.87 samples/sec Loss 12.6787 LearningRate 0.1597 Epoch: 2 Global Step: 13440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:47,709-Speed 3484.62 samples/sec Loss 12.6899 LearningRate 0.1597 Epoch: 2 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:50,720-Speed 3400.79 samples/sec Loss 12.4479 LearningRate 0.1597 Epoch: 2 Global Step: 13460 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:22:53,714-Speed 3421.15 samples/sec Loss 12.7117 LearningRate 0.1597 Epoch: 2 Global Step: 13470 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:22:56,639-Speed 3502.78 samples/sec Loss 12.5482 LearningRate 0.1596 Epoch: 2 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:22:59,580-Speed 3482.50 samples/sec Loss 12.6762 LearningRate 0.1596 Epoch: 2 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:02,556-Speed 3441.75 samples/sec Loss 12.6565 LearningRate 0.1596 Epoch: 2 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:05,488-Speed 3493.20 samples/sec Loss 12.5592 LearningRate 0.1595 Epoch: 2 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:08,433-Speed 3478.48 samples/sec Loss 12.6397 LearningRate 0.1595 Epoch: 2 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:11,372-Speed 3485.03 samples/sec Loss 12.7100 LearningRate 0.1595 Epoch: 2 Global Step: 13530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:14,311-Speed 3484.11 samples/sec Loss 12.3500 LearningRate 0.1595 Epoch: 2 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:17,249-Speed 3486.39 samples/sec Loss 12.5798 LearningRate 0.1594 Epoch: 2 Global Step: 13550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:20,193-Speed 3479.61 samples/sec Loss 12.6421 LearningRate 0.1594 Epoch: 2 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:23,126-Speed 3492.10 samples/sec Loss 12.5702 LearningRate 0.1594 Epoch: 2 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:26,060-Speed 3492.07 samples/sec Loss 12.5493 LearningRate 0.1593 Epoch: 2 Global Step: 13580 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:23:29,001-Speed 3482.61 samples/sec Loss 12.5477 LearningRate 0.1593 Epoch: 2 Global Step: 13590 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:23:32,006-Speed 3407.97 samples/sec Loss 12.4669 LearningRate 0.1593 Epoch: 2 Global Step: 13600 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:23:35,011-Speed 3408.17 samples/sec Loss 12.3454 LearningRate 0.1593 Epoch: 2 Global Step: 13610 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:23:37,985-Speed 3444.55 samples/sec Loss 12.3250 LearningRate 0.1592 Epoch: 2 Global Step: 13620 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:23:40,911-Speed 3500.61 samples/sec Loss 12.4074 LearningRate 0.1592 Epoch: 2 Global Step: 13630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:43,911-Speed 3413.31 samples/sec Loss 12.6745 LearningRate 0.1592 Epoch: 2 Global Step: 13640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:46,951-Speed 3369.77 samples/sec Loss 12.6467 LearningRate 0.1592 Epoch: 2 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:49,883-Speed 3493.79 samples/sec Loss 12.5773 LearningRate 0.1591 Epoch: 2 Global Step: 13660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:52,813-Speed 3496.33 samples/sec Loss 12.5757 LearningRate 0.1591 Epoch: 2 Global Step: 13670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:55,761-Speed 3473.95 samples/sec Loss 12.6406 LearningRate 0.1591 Epoch: 2 Global Step: 13680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:23:58,698-Speed 3488.17 samples/sec Loss 12.6796 LearningRate 0.1590 Epoch: 2 Global Step: 13690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:01,641-Speed 3480.01 samples/sec Loss 12.5369 LearningRate 0.1590 Epoch: 2 Global Step: 13700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:04,581-Speed 3483.66 samples/sec Loss 12.5502 LearningRate 0.1590 Epoch: 2 Global Step: 13710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:07,515-Speed 3492.10 samples/sec Loss 12.7487 LearningRate 0.1590 Epoch: 2 Global Step: 13720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:10,467-Speed 3469.17 samples/sec Loss 12.4713 LearningRate 0.1589 Epoch: 2 Global Step: 13730 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:24:13,419-Speed 3469.37 samples/sec Loss 12.7140 LearningRate 0.1589 Epoch: 2 Global Step: 13740 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:24:16,347-Speed 3497.94 samples/sec Loss 12.5389 LearningRate 0.1589 Epoch: 2 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:19,290-Speed 3481.54 samples/sec Loss 12.5715 LearningRate 0.1588 Epoch: 2 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:22,219-Speed 3496.60 samples/sec Loss 12.3763 LearningRate 0.1588 Epoch: 2 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:25,157-Speed 3487.26 samples/sec Loss 12.5874 LearningRate 0.1588 Epoch: 2 Global Step: 13780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:28,154-Speed 3417.09 samples/sec Loss 12.3823 LearningRate 0.1588 Epoch: 2 Global Step: 13790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:31,093-Speed 3484.93 samples/sec Loss 12.6808 LearningRate 0.1587 Epoch: 2 Global Step: 13800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:34,063-Speed 3449.13 samples/sec Loss 12.5518 LearningRate 0.1587 Epoch: 2 Global Step: 13810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:37,004-Speed 3481.95 samples/sec Loss 12.5548 LearningRate 0.1587 Epoch: 2 Global Step: 13820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:39,963-Speed 3462.51 samples/sec Loss 12.5212 LearningRate 0.1586 Epoch: 2 Global Step: 13830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:42,931-Speed 3450.64 samples/sec Loss 12.3206 LearningRate 0.1586 Epoch: 2 Global Step: 13840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:45,888-Speed 3463.65 samples/sec Loss 12.4665 LearningRate 0.1586 Epoch: 2 Global Step: 13850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:48,827-Speed 3485.04 samples/sec Loss 12.4365 LearningRate 0.1586 Epoch: 2 Global Step: 13860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:51,766-Speed 3485.28 samples/sec Loss 12.6690 LearningRate 0.1585 Epoch: 2 Global Step: 13870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:54,708-Speed 3481.87 samples/sec Loss 12.4286 LearningRate 0.1585 Epoch: 2 Global Step: 13880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:24:57,650-Speed 3482.06 samples/sec Loss 12.4267 LearningRate 0.1585 Epoch: 2 Global Step: 13890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:25:00,595-Speed 3477.15 samples/sec Loss 12.4339 LearningRate 0.1584 Epoch: 2 Global Step: 13900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:25:03,604-Speed 3404.46 samples/sec Loss 12.5066 LearningRate 0.1584 Epoch: 2 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:25:06,572-Speed 3450.90 samples/sec Loss 12.5369 LearningRate 0.1584 Epoch: 2 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:25:09,566-Speed 3421.33 samples/sec Loss 12.4512 LearningRate 0.1584 Epoch: 2 Global Step: 13930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:25:12,506-Speed 3484.61 samples/sec Loss 12.5096 LearningRate 0.1583 Epoch: 2 Global Step: 13940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:25:15,451-Speed 3478.16 samples/sec Loss 12.5043 LearningRate 0.1583 Epoch: 2 Global Step: 13950 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:25:18,415-Speed 3455.29 samples/sec Loss 12.6184 LearningRate 0.1583 Epoch: 2 Global Step: 13960 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:25:21,351-Speed 3487.81 samples/sec Loss 12.5789 LearningRate 0.1582 Epoch: 2 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:25:24,287-Speed 3488.71 samples/sec Loss 12.5874 LearningRate 0.1582 Epoch: 2 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:25:27,250-Speed 3457.96 samples/sec Loss 12.5280 LearningRate 0.1582 Epoch: 2 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:25:30,203-Speed 3468.13 samples/sec Loss 12.4019 LearningRate 0.1582 Epoch: 2 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:26:13,292-[lfw][14000]XNorm: 21.773512 Training: 2022-01-19 18:26:13,293-[lfw][14000]Accuracy-Flip: 0.99467+-0.00348 Training: 2022-01-19 18:26:13,293-[lfw][14000]Accuracy-Highest: 0.99467 Training: 2022-01-19 18:27:03,279-[cfp_fp][14000]XNorm: 18.587893 Training: 2022-01-19 18:27:03,280-[cfp_fp][14000]Accuracy-Flip: 0.94429+-0.01145 Training: 2022-01-19 18:27:03,280-[cfp_fp][14000]Accuracy-Highest: 0.94429 Training: 2022-01-19 18:27:46,272-[agedb_30][14000]XNorm: 21.226571 Training: 2022-01-19 18:27:46,272-[agedb_30][14000]Accuracy-Flip: 0.95500+-0.01038 Training: 2022-01-19 18:27:46,273-[agedb_30][14000]Accuracy-Highest: 0.96017 Training: 2022-01-19 18:27:49,215-Speed 73.66 samples/sec Loss 12.5170 LearningRate 0.1581 Epoch: 2 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:27:52,140-Speed 3502.02 samples/sec Loss 12.5142 LearningRate 0.1581 Epoch: 2 Global Step: 14020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:27:55,088-Speed 3474.33 samples/sec Loss 12.3173 LearningRate 0.1581 Epoch: 2 Global Step: 14030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:27:58,016-Speed 3498.50 samples/sec Loss 12.2903 LearningRate 0.1581 Epoch: 2 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:00,947-Speed 3494.14 samples/sec Loss 12.4747 LearningRate 0.1580 Epoch: 2 Global Step: 14050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:03,883-Speed 3488.80 samples/sec Loss 12.6222 LearningRate 0.1580 Epoch: 2 Global Step: 14060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:06,819-Speed 3488.15 samples/sec Loss 12.4832 LearningRate 0.1580 Epoch: 2 Global Step: 14070 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:28:09,750-Speed 3494.80 samples/sec Loss 12.3108 LearningRate 0.1579 Epoch: 2 Global Step: 14080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:12,688-Speed 3485.96 samples/sec Loss 12.5814 LearningRate 0.1579 Epoch: 2 Global Step: 14090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:15,736-Speed 3361.65 samples/sec Loss 12.3679 LearningRate 0.1579 Epoch: 2 Global Step: 14100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:18,683-Speed 3477.28 samples/sec Loss 12.3322 LearningRate 0.1579 Epoch: 2 Global Step: 14110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:21,639-Speed 3463.92 samples/sec Loss 12.5065 LearningRate 0.1578 Epoch: 2 Global Step: 14120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:24,577-Speed 3487.03 samples/sec Loss 12.5246 LearningRate 0.1578 Epoch: 2 Global Step: 14130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:27,598-Speed 3389.91 samples/sec Loss 12.4903 LearningRate 0.1578 Epoch: 2 Global Step: 14140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:30,532-Speed 3490.61 samples/sec Loss 12.2870 LearningRate 0.1577 Epoch: 2 Global Step: 14150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:33,471-Speed 3486.04 samples/sec Loss 12.6214 LearningRate 0.1577 Epoch: 2 Global Step: 14160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:36,405-Speed 3490.71 samples/sec Loss 12.4969 LearningRate 0.1577 Epoch: 2 Global Step: 14170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:39,343-Speed 3485.66 samples/sec Loss 12.3743 LearningRate 0.1577 Epoch: 2 Global Step: 14180 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:28:42,268-Speed 3503.05 samples/sec Loss 12.5656 LearningRate 0.1576 Epoch: 2 Global Step: 14190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:45,223-Speed 3465.85 samples/sec Loss 12.3963 LearningRate 0.1576 Epoch: 2 Global Step: 14200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:48,176-Speed 3467.84 samples/sec Loss 12.4939 LearningRate 0.1576 Epoch: 2 Global Step: 14210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:51,182-Speed 3408.21 samples/sec Loss 12.3730 LearningRate 0.1575 Epoch: 2 Global Step: 14220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:54,188-Speed 3407.35 samples/sec Loss 12.5247 LearningRate 0.1575 Epoch: 2 Global Step: 14230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:28:57,129-Speed 3482.07 samples/sec Loss 12.5143 LearningRate 0.1575 Epoch: 2 Global Step: 14240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:29:00,070-Speed 3482.77 samples/sec Loss 12.4299 LearningRate 0.1575 Epoch: 2 Global Step: 14250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:29:03,095-Speed 3385.87 samples/sec Loss 12.3686 LearningRate 0.1574 Epoch: 2 Global Step: 14260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:29:06,094-Speed 3415.74 samples/sec Loss 12.4093 LearningRate 0.1574 Epoch: 2 Global Step: 14270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:29:09,065-Speed 3447.52 samples/sec Loss 12.3392 LearningRate 0.1574 Epoch: 2 Global Step: 14280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:29:12,032-Speed 3452.84 samples/sec Loss 12.2909 LearningRate 0.1574 Epoch: 2 Global Step: 14290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:29:15,026-Speed 3420.13 samples/sec Loss 12.5376 LearningRate 0.1573 Epoch: 2 Global Step: 14300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:29:17,995-Speed 3450.62 samples/sec Loss 12.4931 LearningRate 0.1573 Epoch: 2 Global Step: 14310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:29:21,006-Speed 3401.71 samples/sec Loss 12.6826 LearningRate 0.1573 Epoch: 2 Global Step: 14320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:29:23,942-Speed 3487.87 samples/sec Loss 12.5973 LearningRate 0.1572 Epoch: 2 Global Step: 14330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:29:26,882-Speed 3484.57 samples/sec Loss 12.6224 LearningRate 0.1572 Epoch: 2 Global Step: 14340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:29:29,837-Speed 3466.19 samples/sec Loss 12.4870 LearningRate 0.1572 Epoch: 2 Global Step: 14350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:29:32,837-Speed 3413.79 samples/sec Loss 12.4183 LearningRate 0.1572 Epoch: 2 Global Step: 14360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:29:35,785-Speed 3474.88 samples/sec Loss 12.7450 LearningRate 0.1571 Epoch: 2 Global Step: 14370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:29:38,751-Speed 3453.28 samples/sec Loss 12.4916 LearningRate 0.1571 Epoch: 2 Global Step: 14380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:29:41,677-Speed 3501.06 samples/sec Loss 12.4202 LearningRate 0.1571 Epoch: 2 Global Step: 14390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:29:44,610-Speed 3491.97 samples/sec Loss 12.5887 LearningRate 0.1570 Epoch: 2 Global Step: 14400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:29:47,541-Speed 3494.51 samples/sec Loss 12.3479 LearningRate 0.1570 Epoch: 2 Global Step: 14410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:29:50,478-Speed 3487.99 samples/sec Loss 12.4738 LearningRate 0.1570 Epoch: 2 Global Step: 14420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:29:53,463-Speed 3430.48 samples/sec Loss 12.6897 LearningRate 0.1570 Epoch: 2 Global Step: 14430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:29:56,419-Speed 3464.99 samples/sec Loss 12.6058 LearningRate 0.1569 Epoch: 2 Global Step: 14440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:29:59,349-Speed 3496.26 samples/sec Loss 12.5466 LearningRate 0.1569 Epoch: 2 Global Step: 14450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:02,282-Speed 3491.73 samples/sec Loss 12.4078 LearningRate 0.1569 Epoch: 2 Global Step: 14460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:05,221-Speed 3486.10 samples/sec Loss 12.4867 LearningRate 0.1568 Epoch: 2 Global Step: 14470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:08,176-Speed 3465.74 samples/sec Loss 12.3815 LearningRate 0.1568 Epoch: 2 Global Step: 14480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:11,112-Speed 3489.30 samples/sec Loss 12.6709 LearningRate 0.1568 Epoch: 2 Global Step: 14490 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:30:14,056-Speed 3479.19 samples/sec Loss 12.4982 LearningRate 0.1568 Epoch: 2 Global Step: 14500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:16,987-Speed 3494.60 samples/sec Loss 12.6088 LearningRate 0.1567 Epoch: 2 Global Step: 14510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:19,930-Speed 3479.95 samples/sec Loss 12.2020 LearningRate 0.1567 Epoch: 2 Global Step: 14520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:22,861-Speed 3494.36 samples/sec Loss 12.4512 LearningRate 0.1567 Epoch: 2 Global Step: 14530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:25,801-Speed 3484.06 samples/sec Loss 12.4671 LearningRate 0.1566 Epoch: 2 Global Step: 14540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:28,828-Speed 3384.38 samples/sec Loss 12.5022 LearningRate 0.1566 Epoch: 2 Global Step: 14550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:31,790-Speed 3456.91 samples/sec Loss 12.5631 LearningRate 0.1566 Epoch: 2 Global Step: 14560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:34,729-Speed 3486.19 samples/sec Loss 12.5275 LearningRate 0.1566 Epoch: 2 Global Step: 14570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:37,667-Speed 3485.70 samples/sec Loss 12.3944 LearningRate 0.1565 Epoch: 2 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:40,599-Speed 3494.05 samples/sec Loss 12.4061 LearningRate 0.1565 Epoch: 2 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:43,524-Speed 3501.52 samples/sec Loss 12.3265 LearningRate 0.1565 Epoch: 2 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:46,473-Speed 3473.08 samples/sec Loss 12.3656 LearningRate 0.1565 Epoch: 2 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:49,411-Speed 3485.89 samples/sec Loss 12.5171 LearningRate 0.1564 Epoch: 2 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:52,355-Speed 3478.85 samples/sec Loss 12.5145 LearningRate 0.1564 Epoch: 2 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:55,289-Speed 3491.43 samples/sec Loss 12.4908 LearningRate 0.1564 Epoch: 2 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:30:58,239-Speed 3472.83 samples/sec Loss 12.4654 LearningRate 0.1563 Epoch: 2 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:01,172-Speed 3493.08 samples/sec Loss 12.3434 LearningRate 0.1563 Epoch: 2 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:04,103-Speed 3494.56 samples/sec Loss 12.5016 LearningRate 0.1563 Epoch: 2 Global Step: 14670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:07,035-Speed 3494.05 samples/sec Loss 12.3642 LearningRate 0.1563 Epoch: 2 Global Step: 14680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:10,024-Speed 3426.92 samples/sec Loss 12.4447 LearningRate 0.1562 Epoch: 2 Global Step: 14690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:12,956-Speed 3492.75 samples/sec Loss 12.4385 LearningRate 0.1562 Epoch: 2 Global Step: 14700 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:31:15,937-Speed 3436.68 samples/sec Loss 12.3624 LearningRate 0.1562 Epoch: 2 Global Step: 14710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:18,945-Speed 3404.74 samples/sec Loss 12.3112 LearningRate 0.1561 Epoch: 2 Global Step: 14720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:21,927-Speed 3434.61 samples/sec Loss 12.5413 LearningRate 0.1561 Epoch: 2 Global Step: 14730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:24,862-Speed 3490.95 samples/sec Loss 12.4495 LearningRate 0.1561 Epoch: 2 Global Step: 14740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:27,813-Speed 3470.88 samples/sec Loss 12.3467 LearningRate 0.1561 Epoch: 2 Global Step: 14750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:30,747-Speed 3491.74 samples/sec Loss 12.4958 LearningRate 0.1560 Epoch: 2 Global Step: 14760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:33,683-Speed 3488.02 samples/sec Loss 12.3974 LearningRate 0.1560 Epoch: 2 Global Step: 14770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:36,614-Speed 3495.44 samples/sec Loss 12.3832 LearningRate 0.1560 Epoch: 2 Global Step: 14780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:39,551-Speed 3487.20 samples/sec Loss 12.3062 LearningRate 0.1560 Epoch: 2 Global Step: 14790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:42,497-Speed 3476.52 samples/sec Loss 12.3321 LearningRate 0.1559 Epoch: 2 Global Step: 14800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:31:45,429-Speed 3493.20 samples/sec Loss 12.5582 LearningRate 0.1559 Epoch: 2 Global Step: 14810 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:31:48,364-Speed 3489.37 samples/sec Loss 12.3015 LearningRate 0.1559 Epoch: 2 Global Step: 14820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:31:51,338-Speed 3444.47 samples/sec Loss 12.2615 LearningRate 0.1558 Epoch: 2 Global Step: 14830 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:31:54,306-Speed 3451.26 samples/sec Loss 12.2778 LearningRate 0.1558 Epoch: 2 Global Step: 14840 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:31:57,231-Speed 3501.39 samples/sec Loss 12.2742 LearningRate 0.1558 Epoch: 2 Global Step: 14850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:00,179-Speed 3475.37 samples/sec Loss 12.3581 LearningRate 0.1558 Epoch: 2 Global Step: 14860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:03,122-Speed 3480.20 samples/sec Loss 12.2431 LearningRate 0.1557 Epoch: 2 Global Step: 14870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:06,053-Speed 3494.44 samples/sec Loss 12.4356 LearningRate 0.1557 Epoch: 2 Global Step: 14880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:08,984-Speed 3494.89 samples/sec Loss 12.3523 LearningRate 0.1557 Epoch: 2 Global Step: 14890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:11,917-Speed 3492.52 samples/sec Loss 12.3339 LearningRate 0.1556 Epoch: 2 Global Step: 14900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:14,845-Speed 3497.80 samples/sec Loss 12.2856 LearningRate 0.1556 Epoch: 2 Global Step: 14910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:17,784-Speed 3485.12 samples/sec Loss 12.1739 LearningRate 0.1556 Epoch: 2 Global Step: 14920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:20,719-Speed 3490.32 samples/sec Loss 12.3591 LearningRate 0.1556 Epoch: 2 Global Step: 14930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:23,692-Speed 3444.95 samples/sec Loss 12.4378 LearningRate 0.1555 Epoch: 2 Global Step: 14940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:26,617-Speed 3503.05 samples/sec Loss 12.4607 LearningRate 0.1555 Epoch: 2 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:29,549-Speed 3493.63 samples/sec Loss 12.3442 LearningRate 0.1555 Epoch: 2 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:32,507-Speed 3462.74 samples/sec Loss 12.3475 LearningRate 0.1554 Epoch: 2 Global Step: 14970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:35,438-Speed 3495.26 samples/sec Loss 12.3288 LearningRate 0.1554 Epoch: 2 Global Step: 14980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:38,379-Speed 3483.05 samples/sec Loss 12.3002 LearningRate 0.1554 Epoch: 2 Global Step: 14990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:41,344-Speed 3454.84 samples/sec Loss 12.6519 LearningRate 0.1554 Epoch: 2 Global Step: 15000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:44,296-Speed 3469.07 samples/sec Loss 12.2594 LearningRate 0.1553 Epoch: 2 Global Step: 15010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:47,234-Speed 3486.03 samples/sec Loss 12.4837 LearningRate 0.1553 Epoch: 2 Global Step: 15020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:50,182-Speed 3475.44 samples/sec Loss 12.2902 LearningRate 0.1553 Epoch: 2 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:53,170-Speed 3428.30 samples/sec Loss 12.1685 LearningRate 0.1553 Epoch: 2 Global Step: 15040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:56,089-Speed 3509.24 samples/sec Loss 12.2127 LearningRate 0.1552 Epoch: 2 Global Step: 15050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:32:59,029-Speed 3483.55 samples/sec Loss 12.4467 LearningRate 0.1552 Epoch: 2 Global Step: 15060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:01,961-Speed 3493.10 samples/sec Loss 12.1687 LearningRate 0.1552 Epoch: 2 Global Step: 15070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:04,900-Speed 3486.15 samples/sec Loss 12.4240 LearningRate 0.1551 Epoch: 2 Global Step: 15080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:07,843-Speed 3480.19 samples/sec Loss 12.4804 LearningRate 0.1551 Epoch: 2 Global Step: 15090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:10,788-Speed 3477.40 samples/sec Loss 12.2634 LearningRate 0.1551 Epoch: 2 Global Step: 15100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:13,755-Speed 3451.96 samples/sec Loss 12.2985 LearningRate 0.1551 Epoch: 2 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:16,703-Speed 3475.64 samples/sec Loss 12.3060 LearningRate 0.1550 Epoch: 2 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:19,674-Speed 3446.83 samples/sec Loss 12.1255 LearningRate 0.1550 Epoch: 2 Global Step: 15130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:22,681-Speed 3406.57 samples/sec Loss 12.1971 LearningRate 0.1550 Epoch: 2 Global Step: 15140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:25,657-Speed 3442.18 samples/sec Loss 12.4487 LearningRate 0.1549 Epoch: 2 Global Step: 15150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:33:28,615-Speed 3463.00 samples/sec Loss 12.4577 LearningRate 0.1549 Epoch: 2 Global Step: 15160 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:33:31,626-Speed 3401.12 samples/sec Loss 12.1552 LearningRate 0.1549 Epoch: 2 Global Step: 15170 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:33:45,357-Speed 745.86 samples/sec Loss 11.9367 LearningRate 0.1549 Epoch: 3 Global Step: 15180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:48,295-Speed 3485.88 samples/sec Loss 11.4622 LearningRate 0.1548 Epoch: 3 Global Step: 15190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:51,262-Speed 3451.87 samples/sec Loss 11.4579 LearningRate 0.1548 Epoch: 3 Global Step: 15200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:54,205-Speed 3480.16 samples/sec Loss 11.4244 LearningRate 0.1548 Epoch: 3 Global Step: 15210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:33:57,145-Speed 3483.88 samples/sec Loss 11.5792 LearningRate 0.1548 Epoch: 3 Global Step: 15220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:00,089-Speed 3479.71 samples/sec Loss 11.4997 LearningRate 0.1547 Epoch: 3 Global Step: 15230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:03,067-Speed 3439.71 samples/sec Loss 11.6169 LearningRate 0.1547 Epoch: 3 Global Step: 15240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:06,003-Speed 3488.19 samples/sec Loss 11.5429 LearningRate 0.1547 Epoch: 3 Global Step: 15250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:08,958-Speed 3465.60 samples/sec Loss 11.8662 LearningRate 0.1546 Epoch: 3 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:11,963-Speed 3409.14 samples/sec Loss 11.6516 LearningRate 0.1546 Epoch: 3 Global Step: 15270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:14,942-Speed 3438.42 samples/sec Loss 11.8595 LearningRate 0.1546 Epoch: 3 Global Step: 15280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:17,872-Speed 3495.15 samples/sec Loss 11.7684 LearningRate 0.1546 Epoch: 3 Global Step: 15290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:20,805-Speed 3493.00 samples/sec Loss 11.8411 LearningRate 0.1545 Epoch: 3 Global Step: 15300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:23,738-Speed 3491.19 samples/sec Loss 11.8635 LearningRate 0.1545 Epoch: 3 Global Step: 15310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:26,691-Speed 3468.95 samples/sec Loss 11.8508 LearningRate 0.1545 Epoch: 3 Global Step: 15320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:29,625-Speed 3491.29 samples/sec Loss 11.8211 LearningRate 0.1544 Epoch: 3 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:32,563-Speed 3486.74 samples/sec Loss 11.8576 LearningRate 0.1544 Epoch: 3 Global Step: 15340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:35,538-Speed 3443.13 samples/sec Loss 11.8405 LearningRate 0.1544 Epoch: 3 Global Step: 15350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:38,520-Speed 3434.55 samples/sec Loss 11.7366 LearningRate 0.1544 Epoch: 3 Global Step: 15360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:41,473-Speed 3468.87 samples/sec Loss 11.7936 LearningRate 0.1543 Epoch: 3 Global Step: 15370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:44,410-Speed 3486.53 samples/sec Loss 11.8109 LearningRate 0.1543 Epoch: 3 Global Step: 15380 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:34:47,335-Speed 3501.62 samples/sec Loss 11.9499 LearningRate 0.1543 Epoch: 3 Global Step: 15390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:50,270-Speed 3490.38 samples/sec Loss 12.0480 LearningRate 0.1543 Epoch: 3 Global Step: 15400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:53,280-Speed 3403.55 samples/sec Loss 11.7196 LearningRate 0.1542 Epoch: 3 Global Step: 15410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:56,314-Speed 3375.76 samples/sec Loss 11.8373 LearningRate 0.1542 Epoch: 3 Global Step: 15420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:34:59,268-Speed 3467.17 samples/sec Loss 11.7721 LearningRate 0.1542 Epoch: 3 Global Step: 15430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:35:02,206-Speed 3486.89 samples/sec Loss 11.8507 LearningRate 0.1541 Epoch: 3 Global Step: 15440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:35:05,185-Speed 3437.93 samples/sec Loss 11.9277 LearningRate 0.1541 Epoch: 3 Global Step: 15450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:35:08,183-Speed 3417.31 samples/sec Loss 11.8784 LearningRate 0.1541 Epoch: 3 Global Step: 15460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:35:11,125-Speed 3481.05 samples/sec Loss 11.8572 LearningRate 0.1541 Epoch: 3 Global Step: 15470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:35:14,051-Speed 3500.20 samples/sec Loss 11.9340 LearningRate 0.1540 Epoch: 3 Global Step: 15480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:35:16,989-Speed 3486.88 samples/sec Loss 11.9818 LearningRate 0.1540 Epoch: 3 Global Step: 15490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:35:19,925-Speed 3489.70 samples/sec Loss 12.0857 LearningRate 0.1540 Epoch: 3 Global Step: 15500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:35:22,925-Speed 3413.67 samples/sec Loss 12.1052 LearningRate 0.1539 Epoch: 3 Global Step: 15510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:35:26,047-Speed 3281.07 samples/sec Loss 12.0809 LearningRate 0.1539 Epoch: 3 Global Step: 15520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:35:29,034-Speed 3428.11 samples/sec Loss 11.9700 LearningRate 0.1539 Epoch: 3 Global Step: 15530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:35:31,967-Speed 3493.58 samples/sec Loss 12.1190 LearningRate 0.1539 Epoch: 3 Global Step: 15540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:35:34,916-Speed 3473.60 samples/sec Loss 11.9876 LearningRate 0.1538 Epoch: 3 Global Step: 15550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:35:37,848-Speed 3493.20 samples/sec Loss 12.0182 LearningRate 0.1538 Epoch: 3 Global Step: 15560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:35:40,799-Speed 3470.92 samples/sec Loss 11.8609 LearningRate 0.1538 Epoch: 3 Global Step: 15570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:35:43,736-Speed 3487.83 samples/sec Loss 11.8354 LearningRate 0.1538 Epoch: 3 Global Step: 15580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:35:46,695-Speed 3463.54 samples/sec Loss 12.0622 LearningRate 0.1537 Epoch: 3 Global Step: 15590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:35:49,631-Speed 3488.39 samples/sec Loss 12.0900 LearningRate 0.1537 Epoch: 3 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:35:52,566-Speed 3489.59 samples/sec Loss 12.1820 LearningRate 0.1537 Epoch: 3 Global Step: 15610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:35:55,500-Speed 3491.81 samples/sec Loss 12.0579 LearningRate 0.1536 Epoch: 3 Global Step: 15620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:35:58,434-Speed 3490.92 samples/sec Loss 11.9511 LearningRate 0.1536 Epoch: 3 Global Step: 15630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:01,373-Speed 3484.94 samples/sec Loss 12.0584 LearningRate 0.1536 Epoch: 3 Global Step: 15640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:04,403-Speed 3379.65 samples/sec Loss 12.1422 LearningRate 0.1536 Epoch: 3 Global Step: 15650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:07,378-Speed 3442.97 samples/sec Loss 12.0331 LearningRate 0.1535 Epoch: 3 Global Step: 15660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:10,326-Speed 3475.27 samples/sec Loss 12.2407 LearningRate 0.1535 Epoch: 3 Global Step: 15670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:13,260-Speed 3491.49 samples/sec Loss 12.0055 LearningRate 0.1535 Epoch: 3 Global Step: 15680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:36:16,195-Speed 3490.58 samples/sec Loss 11.9638 LearningRate 0.1534 Epoch: 3 Global Step: 15690 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:36:19,178-Speed 3433.05 samples/sec Loss 11.7743 LearningRate 0.1534 Epoch: 3 Global Step: 15700 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:36:22,101-Speed 3503.53 samples/sec Loss 12.0276 LearningRate 0.1534 Epoch: 3 Global Step: 15710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:25,049-Speed 3474.14 samples/sec Loss 12.1709 LearningRate 0.1534 Epoch: 3 Global Step: 15720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:27,997-Speed 3474.35 samples/sec Loss 12.0447 LearningRate 0.1533 Epoch: 3 Global Step: 15730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:30,934-Speed 3488.57 samples/sec Loss 11.9816 LearningRate 0.1533 Epoch: 3 Global Step: 15740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:33,874-Speed 3483.01 samples/sec Loss 12.0083 LearningRate 0.1533 Epoch: 3 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:36,807-Speed 3492.39 samples/sec Loss 12.1643 LearningRate 0.1533 Epoch: 3 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:39,743-Speed 3489.62 samples/sec Loss 11.8013 LearningRate 0.1532 Epoch: 3 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:42,679-Speed 3488.31 samples/sec Loss 12.1172 LearningRate 0.1532 Epoch: 3 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:45,622-Speed 3480.24 samples/sec Loss 12.1924 LearningRate 0.1532 Epoch: 3 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:48,570-Speed 3474.21 samples/sec Loss 12.1641 LearningRate 0.1531 Epoch: 3 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:51,492-Speed 3507.09 samples/sec Loss 12.2144 LearningRate 0.1531 Epoch: 3 Global Step: 15810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:54,430-Speed 3486.31 samples/sec Loss 12.0852 LearningRate 0.1531 Epoch: 3 Global Step: 15820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:36:57,382-Speed 3469.79 samples/sec Loss 12.1005 LearningRate 0.1531 Epoch: 3 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:00,325-Speed 3480.97 samples/sec Loss 11.9968 LearningRate 0.1530 Epoch: 3 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:03,259-Speed 3490.15 samples/sec Loss 11.8868 LearningRate 0.1530 Epoch: 3 Global Step: 15850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:06,192-Speed 3493.09 samples/sec Loss 12.1376 LearningRate 0.1530 Epoch: 3 Global Step: 15860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:09,131-Speed 3485.57 samples/sec Loss 12.0935 LearningRate 0.1529 Epoch: 3 Global Step: 15870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:12,069-Speed 3486.99 samples/sec Loss 12.1812 LearningRate 0.1529 Epoch: 3 Global Step: 15880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:15,003-Speed 3490.65 samples/sec Loss 12.2140 LearningRate 0.1529 Epoch: 3 Global Step: 15890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:17,938-Speed 3489.22 samples/sec Loss 12.1308 LearningRate 0.1529 Epoch: 3 Global Step: 15900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:20,918-Speed 3437.39 samples/sec Loss 12.3108 LearningRate 0.1528 Epoch: 3 Global Step: 15910 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:37:23,845-Speed 3498.91 samples/sec Loss 12.0217 LearningRate 0.1528 Epoch: 3 Global Step: 15920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:26,783-Speed 3487.15 samples/sec Loss 12.0451 LearningRate 0.1528 Epoch: 3 Global Step: 15930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:29,730-Speed 3475.52 samples/sec Loss 12.2248 LearningRate 0.1528 Epoch: 3 Global Step: 15940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:32,666-Speed 3488.60 samples/sec Loss 12.2336 LearningRate 0.1527 Epoch: 3 Global Step: 15950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:35,601-Speed 3489.92 samples/sec Loss 12.0479 LearningRate 0.1527 Epoch: 3 Global Step: 15960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:38,548-Speed 3476.21 samples/sec Loss 12.2578 LearningRate 0.1527 Epoch: 3 Global Step: 15970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:41,605-Speed 3350.15 samples/sec Loss 12.2217 LearningRate 0.1526 Epoch: 3 Global Step: 15980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:44,556-Speed 3470.95 samples/sec Loss 12.2252 LearningRate 0.1526 Epoch: 3 Global Step: 15990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:37:47,502-Speed 3477.24 samples/sec Loss 12.0203 LearningRate 0.1526 Epoch: 3 Global Step: 16000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:38:30,182-[lfw][16000]XNorm: 22.725410 Training: 2022-01-19 18:38:30,183-[lfw][16000]Accuracy-Flip: 0.99583+-0.00291 Training: 2022-01-19 18:38:30,183-[lfw][16000]Accuracy-Highest: 0.99583 Training: 2022-01-19 18:39:19,961-[cfp_fp][16000]XNorm: 19.741703 Training: 2022-01-19 18:39:19,962-[cfp_fp][16000]Accuracy-Flip: 0.94271+-0.01577 Training: 2022-01-19 18:39:19,962-[cfp_fp][16000]Accuracy-Highest: 0.94429 Training: 2022-01-19 18:40:02,795-[agedb_30][16000]XNorm: 21.918580 Training: 2022-01-19 18:40:02,796-[agedb_30][16000]Accuracy-Flip: 0.95933+-0.01036 Training: 2022-01-19 18:40:02,796-[agedb_30][16000]Accuracy-Highest: 0.96017 Training: 2022-01-19 18:40:05,735-Speed 74.08 samples/sec Loss 12.1081 LearningRate 0.1526 Epoch: 3 Global Step: 16010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:08,660-Speed 3501.15 samples/sec Loss 12.0400 LearningRate 0.1525 Epoch: 3 Global Step: 16020 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:40:11,572-Speed 3516.82 samples/sec Loss 12.2571 LearningRate 0.1525 Epoch: 3 Global Step: 16030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:14,499-Speed 3500.02 samples/sec Loss 11.8970 LearningRate 0.1525 Epoch: 3 Global Step: 16040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:17,473-Speed 3443.74 samples/sec Loss 12.0781 LearningRate 0.1525 Epoch: 3 Global Step: 16050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:20,399-Speed 3501.73 samples/sec Loss 12.1243 LearningRate 0.1524 Epoch: 3 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:23,348-Speed 3472.80 samples/sec Loss 12.1693 LearningRate 0.1524 Epoch: 3 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:26,293-Speed 3477.81 samples/sec Loss 11.9097 LearningRate 0.1524 Epoch: 3 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:29,236-Speed 3480.66 samples/sec Loss 11.9591 LearningRate 0.1523 Epoch: 3 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:32,174-Speed 3486.34 samples/sec Loss 12.0816 LearningRate 0.1523 Epoch: 3 Global Step: 16100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:35,119-Speed 3478.67 samples/sec Loss 12.1550 LearningRate 0.1523 Epoch: 3 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:38,053-Speed 3490.86 samples/sec Loss 12.1640 LearningRate 0.1523 Epoch: 3 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:40,988-Speed 3489.75 samples/sec Loss 12.2487 LearningRate 0.1522 Epoch: 3 Global Step: 16130 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:40:43,917-Speed 3496.53 samples/sec Loss 11.9782 LearningRate 0.1522 Epoch: 3 Global Step: 16140 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:40:46,866-Speed 3472.85 samples/sec Loss 12.1437 LearningRate 0.1522 Epoch: 3 Global Step: 16150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:40:49,886-Speed 3392.60 samples/sec Loss 12.0979 LearningRate 0.1521 Epoch: 3 Global Step: 16160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:52,874-Speed 3427.94 samples/sec Loss 12.0067 LearningRate 0.1521 Epoch: 3 Global Step: 16170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:55,808-Speed 3490.52 samples/sec Loss 11.9216 LearningRate 0.1521 Epoch: 3 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:40:58,745-Speed 3487.72 samples/sec Loss 12.2428 LearningRate 0.1521 Epoch: 3 Global Step: 16190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:01,688-Speed 3480.36 samples/sec Loss 12.1135 LearningRate 0.1520 Epoch: 3 Global Step: 16200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:04,648-Speed 3460.37 samples/sec Loss 12.1530 LearningRate 0.1520 Epoch: 3 Global Step: 16210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:07,586-Speed 3486.47 samples/sec Loss 12.1815 LearningRate 0.1520 Epoch: 3 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:10,528-Speed 3481.59 samples/sec Loss 12.0824 LearningRate 0.1520 Epoch: 3 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:13,472-Speed 3479.70 samples/sec Loss 12.0774 LearningRate 0.1519 Epoch: 3 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:16,419-Speed 3474.97 samples/sec Loss 12.2045 LearningRate 0.1519 Epoch: 3 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:19,353-Speed 3491.36 samples/sec Loss 12.0907 LearningRate 0.1519 Epoch: 3 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:22,307-Speed 3467.62 samples/sec Loss 12.2118 LearningRate 0.1518 Epoch: 3 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:25,256-Speed 3473.53 samples/sec Loss 12.0759 LearningRate 0.1518 Epoch: 3 Global Step: 16280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:28,264-Speed 3404.81 samples/sec Loss 12.0155 LearningRate 0.1518 Epoch: 3 Global Step: 16290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:31,280-Speed 3396.52 samples/sec Loss 12.0039 LearningRate 0.1518 Epoch: 3 Global Step: 16300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:34,209-Speed 3497.26 samples/sec Loss 12.1181 LearningRate 0.1517 Epoch: 3 Global Step: 16310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:37,141-Speed 3492.44 samples/sec Loss 11.9200 LearningRate 0.1517 Epoch: 3 Global Step: 16320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:40,073-Speed 3493.81 samples/sec Loss 12.1177 LearningRate 0.1517 Epoch: 3 Global Step: 16330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:43,003-Speed 3496.53 samples/sec Loss 11.9568 LearningRate 0.1517 Epoch: 3 Global Step: 16340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:45,951-Speed 3473.90 samples/sec Loss 12.2018 LearningRate 0.1516 Epoch: 3 Global Step: 16350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:48,892-Speed 3482.83 samples/sec Loss 12.1051 LearningRate 0.1516 Epoch: 3 Global Step: 16360 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:41:51,815-Speed 3504.07 samples/sec Loss 12.1880 LearningRate 0.1516 Epoch: 3 Global Step: 16370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:54,766-Speed 3471.02 samples/sec Loss 11.8157 LearningRate 0.1515 Epoch: 3 Global Step: 16380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:41:57,736-Speed 3449.26 samples/sec Loss 12.0316 LearningRate 0.1515 Epoch: 3 Global Step: 16390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:00,699-Speed 3456.36 samples/sec Loss 11.8651 LearningRate 0.1515 Epoch: 3 Global Step: 16400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:03,662-Speed 3457.37 samples/sec Loss 11.9264 LearningRate 0.1515 Epoch: 3 Global Step: 16410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:06,617-Speed 3465.74 samples/sec Loss 12.0359 LearningRate 0.1514 Epoch: 3 Global Step: 16420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:09,561-Speed 3479.58 samples/sec Loss 12.1494 LearningRate 0.1514 Epoch: 3 Global Step: 16430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:12,516-Speed 3466.08 samples/sec Loss 12.0078 LearningRate 0.1514 Epoch: 3 Global Step: 16440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:15,448-Speed 3494.25 samples/sec Loss 11.9297 LearningRate 0.1513 Epoch: 3 Global Step: 16450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:18,380-Speed 3492.48 samples/sec Loss 12.0006 LearningRate 0.1513 Epoch: 3 Global Step: 16460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:21,321-Speed 3482.93 samples/sec Loss 12.0172 LearningRate 0.1513 Epoch: 3 Global Step: 16470 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:42:24,258-Speed 3487.97 samples/sec Loss 12.0789 LearningRate 0.1513 Epoch: 3 Global Step: 16480 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:42:27,193-Speed 3489.94 samples/sec Loss 11.9338 LearningRate 0.1512 Epoch: 3 Global Step: 16490 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:42:30,131-Speed 3486.17 samples/sec Loss 12.0663 LearningRate 0.1512 Epoch: 3 Global Step: 16500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:33,184-Speed 3354.71 samples/sec Loss 12.1928 LearningRate 0.1512 Epoch: 3 Global Step: 16510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:36,163-Speed 3438.53 samples/sec Loss 12.0299 LearningRate 0.1512 Epoch: 3 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:39,092-Speed 3497.28 samples/sec Loss 12.1097 LearningRate 0.1511 Epoch: 3 Global Step: 16530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:42,023-Speed 3494.33 samples/sec Loss 11.8981 LearningRate 0.1511 Epoch: 3 Global Step: 16540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:45,008-Speed 3431.09 samples/sec Loss 12.2494 LearningRate 0.1511 Epoch: 3 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:47,937-Speed 3497.22 samples/sec Loss 12.1512 LearningRate 0.1510 Epoch: 3 Global Step: 16560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:50,885-Speed 3475.33 samples/sec Loss 12.0418 LearningRate 0.1510 Epoch: 3 Global Step: 16570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:53,818-Speed 3492.22 samples/sec Loss 11.8995 LearningRate 0.1510 Epoch: 3 Global Step: 16580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:56,756-Speed 3486.85 samples/sec Loss 11.9108 LearningRate 0.1510 Epoch: 3 Global Step: 16590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:42:59,688-Speed 3493.51 samples/sec Loss 12.0323 LearningRate 0.1509 Epoch: 3 Global Step: 16600 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:43:02,668-Speed 3437.02 samples/sec Loss 12.2273 LearningRate 0.1509 Epoch: 3 Global Step: 16610 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:43:05,593-Speed 3501.81 samples/sec Loss 12.0184 LearningRate 0.1509 Epoch: 3 Global Step: 16620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:08,523-Speed 3496.12 samples/sec Loss 12.0087 LearningRate 0.1509 Epoch: 3 Global Step: 16630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:11,464-Speed 3481.91 samples/sec Loss 12.1453 LearningRate 0.1508 Epoch: 3 Global Step: 16640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:14,402-Speed 3486.78 samples/sec Loss 11.8466 LearningRate 0.1508 Epoch: 3 Global Step: 16650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:17,370-Speed 3450.36 samples/sec Loss 12.0384 LearningRate 0.1508 Epoch: 3 Global Step: 16660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:20,309-Speed 3485.89 samples/sec Loss 11.9824 LearningRate 0.1507 Epoch: 3 Global Step: 16670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:23,237-Speed 3497.74 samples/sec Loss 12.1296 LearningRate 0.1507 Epoch: 3 Global Step: 16680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:26,166-Speed 3497.22 samples/sec Loss 12.2467 LearningRate 0.1507 Epoch: 3 Global Step: 16690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:29,149-Speed 3433.88 samples/sec Loss 11.9463 LearningRate 0.1507 Epoch: 3 Global Step: 16700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:32,139-Speed 3425.11 samples/sec Loss 12.0706 LearningRate 0.1506 Epoch: 3 Global Step: 16710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:35,130-Speed 3424.62 samples/sec Loss 12.0094 LearningRate 0.1506 Epoch: 3 Global Step: 16720 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:43:38,065-Speed 3489.90 samples/sec Loss 11.9904 LearningRate 0.1506 Epoch: 3 Global Step: 16730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:41,011-Speed 3477.59 samples/sec Loss 12.1761 LearningRate 0.1506 Epoch: 3 Global Step: 16740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:43,969-Speed 3462.12 samples/sec Loss 12.0709 LearningRate 0.1505 Epoch: 3 Global Step: 16750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:46,902-Speed 3492.05 samples/sec Loss 11.8530 LearningRate 0.1505 Epoch: 3 Global Step: 16760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:49,837-Speed 3490.89 samples/sec Loss 11.9817 LearningRate 0.1505 Epoch: 3 Global Step: 16770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:52,765-Speed 3498.35 samples/sec Loss 12.1044 LearningRate 0.1504 Epoch: 3 Global Step: 16780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:55,727-Speed 3457.70 samples/sec Loss 12.0465 LearningRate 0.1504 Epoch: 3 Global Step: 16790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:43:58,719-Speed 3423.22 samples/sec Loss 12.0476 LearningRate 0.1504 Epoch: 3 Global Step: 16800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:01,648-Speed 3497.66 samples/sec Loss 11.8887 LearningRate 0.1504 Epoch: 3 Global Step: 16810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:04,603-Speed 3466.45 samples/sec Loss 11.9999 LearningRate 0.1503 Epoch: 3 Global Step: 16820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:07,537-Speed 3490.13 samples/sec Loss 12.0749 LearningRate 0.1503 Epoch: 3 Global Step: 16830 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:44:10,468-Speed 3495.09 samples/sec Loss 12.0317 LearningRate 0.1503 Epoch: 3 Global Step: 16840 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:44:13,418-Speed 3471.90 samples/sec Loss 12.1102 LearningRate 0.1502 Epoch: 3 Global Step: 16850 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:44:16,367-Speed 3474.01 samples/sec Loss 12.1286 LearningRate 0.1502 Epoch: 3 Global Step: 16860 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:44:19,297-Speed 3496.15 samples/sec Loss 11.8816 LearningRate 0.1502 Epoch: 3 Global Step: 16870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:22,247-Speed 3471.33 samples/sec Loss 12.1054 LearningRate 0.1502 Epoch: 3 Global Step: 16880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:25,273-Speed 3386.27 samples/sec Loss 11.9525 LearningRate 0.1501 Epoch: 3 Global Step: 16890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:28,215-Speed 3480.30 samples/sec Loss 11.9038 LearningRate 0.1501 Epoch: 3 Global Step: 16900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:31,143-Speed 3499.02 samples/sec Loss 11.8537 LearningRate 0.1501 Epoch: 3 Global Step: 16910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:34,072-Speed 3495.78 samples/sec Loss 12.1466 LearningRate 0.1501 Epoch: 3 Global Step: 16920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:37,010-Speed 3486.79 samples/sec Loss 12.0692 LearningRate 0.1500 Epoch: 3 Global Step: 16930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:39,956-Speed 3477.23 samples/sec Loss 12.1362 LearningRate 0.1500 Epoch: 3 Global Step: 16940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:42,899-Speed 3479.82 samples/sec Loss 11.9571 LearningRate 0.1500 Epoch: 3 Global Step: 16950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:45,842-Speed 3480.71 samples/sec Loss 12.1295 LearningRate 0.1499 Epoch: 3 Global Step: 16960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:44:48,775-Speed 3492.83 samples/sec Loss 11.9256 LearningRate 0.1499 Epoch: 3 Global Step: 16970 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:44:51,711-Speed 3487.92 samples/sec Loss 12.0300 LearningRate 0.1499 Epoch: 3 Global Step: 16980 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:44:54,735-Speed 3387.41 samples/sec Loss 11.9738 LearningRate 0.1499 Epoch: 3 Global Step: 16990 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:44:57,676-Speed 3482.53 samples/sec Loss 11.9218 LearningRate 0.1498 Epoch: 3 Global Step: 17000 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:45:00,631-Speed 3466.59 samples/sec Loss 11.9812 LearningRate 0.1498 Epoch: 3 Global Step: 17010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:03,566-Speed 3489.69 samples/sec Loss 12.0084 LearningRate 0.1498 Epoch: 3 Global Step: 17020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:06,503-Speed 3487.05 samples/sec Loss 12.0188 LearningRate 0.1498 Epoch: 3 Global Step: 17030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:09,440-Speed 3488.16 samples/sec Loss 11.8974 LearningRate 0.1497 Epoch: 3 Global Step: 17040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:12,383-Speed 3479.90 samples/sec Loss 11.8928 LearningRate 0.1497 Epoch: 3 Global Step: 17050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:15,315-Speed 3494.38 samples/sec Loss 11.9855 LearningRate 0.1497 Epoch: 3 Global Step: 17060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:18,270-Speed 3465.16 samples/sec Loss 12.0550 LearningRate 0.1496 Epoch: 3 Global Step: 17070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:21,216-Speed 3477.11 samples/sec Loss 12.2475 LearningRate 0.1496 Epoch: 3 Global Step: 17080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:24,146-Speed 3495.73 samples/sec Loss 11.8964 LearningRate 0.1496 Epoch: 3 Global Step: 17090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:27,168-Speed 3388.95 samples/sec Loss 11.9661 LearningRate 0.1496 Epoch: 3 Global Step: 17100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:30,108-Speed 3485.20 samples/sec Loss 11.8057 LearningRate 0.1495 Epoch: 3 Global Step: 17110 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:45:33,048-Speed 3483.61 samples/sec Loss 12.1152 LearningRate 0.1495 Epoch: 3 Global Step: 17120 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:45:35,984-Speed 3488.44 samples/sec Loss 11.9813 LearningRate 0.1495 Epoch: 3 Global Step: 17130 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:45:39,026-Speed 3367.17 samples/sec Loss 11.9920 LearningRate 0.1495 Epoch: 3 Global Step: 17140 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:45:41,974-Speed 3475.15 samples/sec Loss 12.0966 LearningRate 0.1494 Epoch: 3 Global Step: 17150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:45:44,893-Speed 3509.07 samples/sec Loss 12.0484 LearningRate 0.1494 Epoch: 3 Global Step: 17160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:47,854-Speed 3458.93 samples/sec Loss 12.0577 LearningRate 0.1494 Epoch: 3 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:50,824-Speed 3449.26 samples/sec Loss 12.0605 LearningRate 0.1493 Epoch: 3 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:53,911-Speed 3317.68 samples/sec Loss 11.9397 LearningRate 0.1493 Epoch: 3 Global Step: 17190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:56,873-Speed 3457.35 samples/sec Loss 11.8545 LearningRate 0.1493 Epoch: 3 Global Step: 17200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:45:59,850-Speed 3440.64 samples/sec Loss 11.9406 LearningRate 0.1493 Epoch: 3 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:02,842-Speed 3423.86 samples/sec Loss 11.9534 LearningRate 0.1492 Epoch: 3 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:05,827-Speed 3432.40 samples/sec Loss 11.8634 LearningRate 0.1492 Epoch: 3 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:08,848-Speed 3389.84 samples/sec Loss 11.9349 LearningRate 0.1492 Epoch: 3 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:11,823-Speed 3442.95 samples/sec Loss 12.0068 LearningRate 0.1492 Epoch: 3 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:14,872-Speed 3360.01 samples/sec Loss 11.8802 LearningRate 0.1491 Epoch: 3 Global Step: 17260 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:46:17,803-Speed 3493.96 samples/sec Loss 11.9848 LearningRate 0.1491 Epoch: 3 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:20,762-Speed 3461.44 samples/sec Loss 12.0294 LearningRate 0.1491 Epoch: 3 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:23,694-Speed 3493.53 samples/sec Loss 11.9931 LearningRate 0.1490 Epoch: 3 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:26,644-Speed 3471.71 samples/sec Loss 11.7662 LearningRate 0.1490 Epoch: 3 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:29,710-Speed 3341.19 samples/sec Loss 11.9010 LearningRate 0.1490 Epoch: 3 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:32,708-Speed 3416.51 samples/sec Loss 11.8951 LearningRate 0.1490 Epoch: 3 Global Step: 17320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:35,659-Speed 3471.13 samples/sec Loss 11.8828 LearningRate 0.1489 Epoch: 3 Global Step: 17330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:38,625-Speed 3452.88 samples/sec Loss 11.7674 LearningRate 0.1489 Epoch: 3 Global Step: 17340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:41,561-Speed 3489.01 samples/sec Loss 12.0315 LearningRate 0.1489 Epoch: 3 Global Step: 17350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:44,536-Speed 3442.99 samples/sec Loss 11.9033 LearningRate 0.1489 Epoch: 3 Global Step: 17360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:47,500-Speed 3455.20 samples/sec Loss 11.8482 LearningRate 0.1488 Epoch: 3 Global Step: 17370 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:46:50,443-Speed 3481.38 samples/sec Loss 12.0721 LearningRate 0.1488 Epoch: 3 Global Step: 17380 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:46:53,375-Speed 3493.13 samples/sec Loss 11.8834 LearningRate 0.1488 Epoch: 3 Global Step: 17390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:56,395-Speed 3391.58 samples/sec Loss 12.0082 LearningRate 0.1487 Epoch: 3 Global Step: 17400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:46:59,366-Speed 3447.65 samples/sec Loss 11.8691 LearningRate 0.1487 Epoch: 3 Global Step: 17410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:02,299-Speed 3492.17 samples/sec Loss 11.9909 LearningRate 0.1487 Epoch: 3 Global Step: 17420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:05,266-Speed 3452.96 samples/sec Loss 11.9320 LearningRate 0.1487 Epoch: 3 Global Step: 17430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:08,266-Speed 3414.18 samples/sec Loss 11.9431 LearningRate 0.1486 Epoch: 3 Global Step: 17440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:11,268-Speed 3412.10 samples/sec Loss 12.1058 LearningRate 0.1486 Epoch: 3 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:14,273-Speed 3408.63 samples/sec Loss 11.8940 LearningRate 0.1486 Epoch: 3 Global Step: 17460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:17,228-Speed 3465.67 samples/sec Loss 11.8599 LearningRate 0.1486 Epoch: 3 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:20,173-Speed 3477.75 samples/sec Loss 12.0051 LearningRate 0.1485 Epoch: 3 Global Step: 17480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:23,147-Speed 3444.19 samples/sec Loss 12.0717 LearningRate 0.1485 Epoch: 3 Global Step: 17490 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:47:26,081-Speed 3491.89 samples/sec Loss 11.9811 LearningRate 0.1485 Epoch: 3 Global Step: 17500 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:47:29,059-Speed 3439.37 samples/sec Loss 12.0617 LearningRate 0.1484 Epoch: 3 Global Step: 17510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:32,049-Speed 3425.61 samples/sec Loss 11.6678 LearningRate 0.1484 Epoch: 3 Global Step: 17520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:34,995-Speed 3476.37 samples/sec Loss 12.0367 LearningRate 0.1484 Epoch: 3 Global Step: 17530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:37,929-Speed 3491.33 samples/sec Loss 11.8929 LearningRate 0.1484 Epoch: 3 Global Step: 17540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:40,865-Speed 3489.30 samples/sec Loss 12.0653 LearningRate 0.1483 Epoch: 3 Global Step: 17550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:43,836-Speed 3446.62 samples/sec Loss 12.0768 LearningRate 0.1483 Epoch: 3 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:46,771-Speed 3490.05 samples/sec Loss 11.9305 LearningRate 0.1483 Epoch: 3 Global Step: 17570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:49,743-Speed 3446.37 samples/sec Loss 11.8637 LearningRate 0.1483 Epoch: 3 Global Step: 17580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:52,730-Speed 3428.50 samples/sec Loss 11.9236 LearningRate 0.1482 Epoch: 3 Global Step: 17590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:55,679-Speed 3473.87 samples/sec Loss 11.7243 LearningRate 0.1482 Epoch: 3 Global Step: 17600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:47:58,626-Speed 3475.90 samples/sec Loss 11.8775 LearningRate 0.1482 Epoch: 3 Global Step: 17610 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:48:01,556-Speed 3496.17 samples/sec Loss 11.8840 LearningRate 0.1481 Epoch: 3 Global Step: 17620 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:48:04,489-Speed 3492.63 samples/sec Loss 11.6960 LearningRate 0.1481 Epoch: 3 Global Step: 17630 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:48:07,451-Speed 3457.01 samples/sec Loss 11.8558 LearningRate 0.1481 Epoch: 3 Global Step: 17640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:48:10,389-Speed 3487.00 samples/sec Loss 11.9146 LearningRate 0.1481 Epoch: 3 Global Step: 17650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:48:13,335-Speed 3476.14 samples/sec Loss 11.9022 LearningRate 0.1480 Epoch: 3 Global Step: 17660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:48:16,275-Speed 3484.08 samples/sec Loss 11.9020 LearningRate 0.1480 Epoch: 3 Global Step: 17670 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:48:19,209-Speed 3491.62 samples/sec Loss 11.9349 LearningRate 0.1480 Epoch: 3 Global Step: 17680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:48:22,142-Speed 3492.05 samples/sec Loss 11.9169 LearningRate 0.1480 Epoch: 3 Global Step: 17690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:48:25,073-Speed 3495.10 samples/sec Loss 11.8643 LearningRate 0.1479 Epoch: 3 Global Step: 17700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:48:28,009-Speed 3488.01 samples/sec Loss 11.8659 LearningRate 0.1479 Epoch: 3 Global Step: 17710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:48:30,941-Speed 3493.86 samples/sec Loss 11.9200 LearningRate 0.1479 Epoch: 3 Global Step: 17720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:48:33,880-Speed 3485.18 samples/sec Loss 11.7569 LearningRate 0.1478 Epoch: 3 Global Step: 17730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:48:36,817-Speed 3487.28 samples/sec Loss 11.8590 LearningRate 0.1478 Epoch: 3 Global Step: 17740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:48:39,752-Speed 3490.45 samples/sec Loss 11.7521 LearningRate 0.1478 Epoch: 3 Global Step: 17750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:48:42,708-Speed 3464.72 samples/sec Loss 11.7068 LearningRate 0.1478 Epoch: 3 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:48:45,652-Speed 3478.62 samples/sec Loss 11.7761 LearningRate 0.1477 Epoch: 3 Global Step: 17770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:48:48,592-Speed 3484.06 samples/sec Loss 11.7029 LearningRate 0.1477 Epoch: 3 Global Step: 17780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:48:51,525-Speed 3492.63 samples/sec Loss 11.7665 LearningRate 0.1477 Epoch: 3 Global Step: 17790 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:48:54,464-Speed 3485.10 samples/sec Loss 11.8923 LearningRate 0.1477 Epoch: 3 Global Step: 17800 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:48:57,383-Speed 3508.72 samples/sec Loss 11.9327 LearningRate 0.1476 Epoch: 3 Global Step: 17810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:00,320-Speed 3488.02 samples/sec Loss 12.0270 LearningRate 0.1476 Epoch: 3 Global Step: 17820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:03,281-Speed 3459.25 samples/sec Loss 11.9541 LearningRate 0.1476 Epoch: 3 Global Step: 17830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:06,235-Speed 3467.24 samples/sec Loss 11.8471 LearningRate 0.1475 Epoch: 3 Global Step: 17840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:09,185-Speed 3472.79 samples/sec Loss 11.8417 LearningRate 0.1475 Epoch: 3 Global Step: 17850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:12,121-Speed 3488.22 samples/sec Loss 11.8195 LearningRate 0.1475 Epoch: 3 Global Step: 17860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:15,055-Speed 3491.34 samples/sec Loss 11.8323 LearningRate 0.1475 Epoch: 3 Global Step: 17870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:17,987-Speed 3492.39 samples/sec Loss 12.0044 LearningRate 0.1474 Epoch: 3 Global Step: 17880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:21,003-Speed 3397.06 samples/sec Loss 11.8700 LearningRate 0.1474 Epoch: 3 Global Step: 17890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:23,940-Speed 3487.67 samples/sec Loss 11.8074 LearningRate 0.1474 Epoch: 3 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:26,862-Speed 3504.73 samples/sec Loss 11.8594 LearningRate 0.1474 Epoch: 3 Global Step: 17910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:29,796-Speed 3490.94 samples/sec Loss 11.7412 LearningRate 0.1473 Epoch: 3 Global Step: 17920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:32,737-Speed 3482.58 samples/sec Loss 11.8912 LearningRate 0.1473 Epoch: 3 Global Step: 17930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:35,673-Speed 3489.78 samples/sec Loss 11.8803 LearningRate 0.1473 Epoch: 3 Global Step: 17940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:38,605-Speed 3492.84 samples/sec Loss 11.9782 LearningRate 0.1472 Epoch: 3 Global Step: 17950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:41,536-Speed 3495.04 samples/sec Loss 11.9581 LearningRate 0.1472 Epoch: 3 Global Step: 17960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:44,473-Speed 3487.21 samples/sec Loss 11.8543 LearningRate 0.1472 Epoch: 3 Global Step: 17970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:47,419-Speed 3476.64 samples/sec Loss 11.8562 LearningRate 0.1472 Epoch: 3 Global Step: 17980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:50,415-Speed 3419.77 samples/sec Loss 11.9455 LearningRate 0.1471 Epoch: 3 Global Step: 17990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:49:53,351-Speed 3488.53 samples/sec Loss 12.0833 LearningRate 0.1471 Epoch: 3 Global Step: 18000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:50:36,826-[lfw][18000]XNorm: 23.040139 Training: 2022-01-19 18:50:36,827-[lfw][18000]Accuracy-Flip: 0.99550+-0.00395 Training: 2022-01-19 18:50:36,828-[lfw][18000]Accuracy-Highest: 0.99583 Training: 2022-01-19 18:51:27,535-[cfp_fp][18000]XNorm: 19.850835 Training: 2022-01-19 18:51:27,536-[cfp_fp][18000]Accuracy-Flip: 0.94000+-0.01187 Training: 2022-01-19 18:51:27,536-[cfp_fp][18000]Accuracy-Highest: 0.94429 Training: 2022-01-19 18:52:11,098-[agedb_30][18000]XNorm: 22.420205 Training: 2022-01-19 18:52:11,099-[agedb_30][18000]Accuracy-Flip: 0.96700+-0.00966 Training: 2022-01-19 18:52:11,100-[agedb_30][18000]Accuracy-Highest: 0.96700 Training: 2022-01-19 18:52:14,071-Speed 72.77 samples/sec Loss 11.8618 LearningRate 0.1471 Epoch: 3 Global Step: 18010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:52:17,010-Speed 3485.54 samples/sec Loss 11.9022 LearningRate 0.1471 Epoch: 3 Global Step: 18020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:52:19,972-Speed 3457.55 samples/sec Loss 11.8444 LearningRate 0.1470 Epoch: 3 Global Step: 18030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:52:22,900-Speed 3498.54 samples/sec Loss 11.6478 LearningRate 0.1470 Epoch: 3 Global Step: 18040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:52:25,929-Speed 3382.59 samples/sec Loss 11.9669 LearningRate 0.1470 Epoch: 3 Global Step: 18050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:52:28,898-Speed 3449.21 samples/sec Loss 11.8998 LearningRate 0.1470 Epoch: 3 Global Step: 18060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:52:31,842-Speed 3479.13 samples/sec Loss 11.7974 LearningRate 0.1469 Epoch: 3 Global Step: 18070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:52:34,883-Speed 3369.02 samples/sec Loss 11.8598 LearningRate 0.1469 Epoch: 3 Global Step: 18080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:52:37,906-Speed 3388.52 samples/sec Loss 12.0846 LearningRate 0.1469 Epoch: 3 Global Step: 18090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:52:40,838-Speed 3494.28 samples/sec Loss 11.8309 LearningRate 0.1468 Epoch: 3 Global Step: 18100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:52:43,794-Speed 3464.32 samples/sec Loss 11.7172 LearningRate 0.1468 Epoch: 3 Global Step: 18110 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:52:46,710-Speed 3513.30 samples/sec Loss 11.9341 LearningRate 0.1468 Epoch: 3 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:52:49,667-Speed 3463.48 samples/sec Loss 11.7853 LearningRate 0.1468 Epoch: 3 Global Step: 18130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:52:52,612-Speed 3479.26 samples/sec Loss 11.8676 LearningRate 0.1467 Epoch: 3 Global Step: 18140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:52:55,548-Speed 3488.39 samples/sec Loss 11.7936 LearningRate 0.1467 Epoch: 3 Global Step: 18150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:52:58,483-Speed 3491.01 samples/sec Loss 11.9500 LearningRate 0.1467 Epoch: 3 Global Step: 18160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:53:01,418-Speed 3489.04 samples/sec Loss 11.7828 LearningRate 0.1467 Epoch: 3 Global Step: 18170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:53:04,352-Speed 3490.90 samples/sec Loss 11.8964 LearningRate 0.1466 Epoch: 3 Global Step: 18180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:53:07,293-Speed 3483.17 samples/sec Loss 11.8775 LearningRate 0.1466 Epoch: 3 Global Step: 18190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:53:10,240-Speed 3475.31 samples/sec Loss 11.8698 LearningRate 0.1466 Epoch: 3 Global Step: 18200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:53:13,332-Speed 3313.09 samples/sec Loss 11.8797 LearningRate 0.1465 Epoch: 3 Global Step: 18210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:53:16,431-Speed 3305.42 samples/sec Loss 11.9817 LearningRate 0.1465 Epoch: 3 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:53:19,431-Speed 3413.58 samples/sec Loss 11.9627 LearningRate 0.1465 Epoch: 3 Global Step: 18230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:53:22,371-Speed 3484.19 samples/sec Loss 11.8260 LearningRate 0.1465 Epoch: 3 Global Step: 18240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:53:25,341-Speed 3449.40 samples/sec Loss 11.6371 LearningRate 0.1464 Epoch: 3 Global Step: 18250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:53:28,289-Speed 3475.05 samples/sec Loss 11.7654 LearningRate 0.1464 Epoch: 3 Global Step: 18260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:53:31,230-Speed 3482.12 samples/sec Loss 11.6057 LearningRate 0.1464 Epoch: 3 Global Step: 18270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:53:34,207-Speed 3440.74 samples/sec Loss 11.7317 LearningRate 0.1464 Epoch: 3 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:53:37,141-Speed 3491.21 samples/sec Loss 11.5772 LearningRate 0.1463 Epoch: 3 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:53:40,072-Speed 3494.61 samples/sec Loss 11.7577 LearningRate 0.1463 Epoch: 3 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:53:43,005-Speed 3492.13 samples/sec Loss 11.7580 LearningRate 0.1463 Epoch: 3 Global Step: 18310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:53:45,949-Speed 3478.56 samples/sec Loss 11.6324 LearningRate 0.1462 Epoch: 3 Global Step: 18320 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:53:48,888-Speed 3486.04 samples/sec Loss 11.8531 LearningRate 0.1462 Epoch: 3 Global Step: 18330 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:53:51,824-Speed 3488.22 samples/sec Loss 11.7439 LearningRate 0.1462 Epoch: 3 Global Step: 18340 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:53:54,747-Speed 3503.71 samples/sec Loss 11.5802 LearningRate 0.1462 Epoch: 3 Global Step: 18350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:53:57,712-Speed 3455.33 samples/sec Loss 11.6895 LearningRate 0.1461 Epoch: 3 Global Step: 18360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:54:00,686-Speed 3443.95 samples/sec Loss 11.6442 LearningRate 0.1461 Epoch: 3 Global Step: 18370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:54:03,698-Speed 3400.79 samples/sec Loss 11.7683 LearningRate 0.1461 Epoch: 3 Global Step: 18380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:54:06,650-Speed 3469.28 samples/sec Loss 11.7171 LearningRate 0.1461 Epoch: 3 Global Step: 18390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:54:09,693-Speed 3366.52 samples/sec Loss 11.6710 LearningRate 0.1460 Epoch: 3 Global Step: 18400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:54:12,795-Speed 3302.27 samples/sec Loss 11.6542 LearningRate 0.1460 Epoch: 3 Global Step: 18410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:54:15,764-Speed 3449.03 samples/sec Loss 11.7625 LearningRate 0.1460 Epoch: 3 Global Step: 18420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:54:18,699-Speed 3489.97 samples/sec Loss 11.8869 LearningRate 0.1459 Epoch: 3 Global Step: 18430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:54:21,654-Speed 3466.28 samples/sec Loss 11.6723 LearningRate 0.1459 Epoch: 3 Global Step: 18440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:54:24,625-Speed 3447.31 samples/sec Loss 11.6354 LearningRate 0.1459 Epoch: 3 Global Step: 18450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:54:27,653-Speed 3383.65 samples/sec Loss 11.6745 LearningRate 0.1459 Epoch: 3 Global Step: 18460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:54:30,606-Speed 3468.63 samples/sec Loss 11.6086 LearningRate 0.1458 Epoch: 3 Global Step: 18470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:54:33,557-Speed 3471.96 samples/sec Loss 11.6537 LearningRate 0.1458 Epoch: 3 Global Step: 18480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:54:36,496-Speed 3484.78 samples/sec Loss 11.7852 LearningRate 0.1458 Epoch: 3 Global Step: 18490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:54:39,489-Speed 3421.25 samples/sec Loss 11.5384 LearningRate 0.1458 Epoch: 3 Global Step: 18500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:54:42,442-Speed 3468.81 samples/sec Loss 11.6023 LearningRate 0.1457 Epoch: 3 Global Step: 18510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:54:45,375-Speed 3492.62 samples/sec Loss 11.9613 LearningRate 0.1457 Epoch: 3 Global Step: 18520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:54:48,307-Speed 3493.78 samples/sec Loss 11.9048 LearningRate 0.1457 Epoch: 3 Global Step: 18530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:54:51,235-Speed 3497.56 samples/sec Loss 11.5599 LearningRate 0.1457 Epoch: 3 Global Step: 18540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:54:54,170-Speed 3490.38 samples/sec Loss 11.5364 LearningRate 0.1456 Epoch: 3 Global Step: 18550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:54:57,118-Speed 3474.13 samples/sec Loss 11.7408 LearningRate 0.1456 Epoch: 3 Global Step: 18560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:00,056-Speed 3487.21 samples/sec Loss 11.7617 LearningRate 0.1456 Epoch: 3 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:03,026-Speed 3448.61 samples/sec Loss 11.7830 LearningRate 0.1455 Epoch: 3 Global Step: 18580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:06,100-Speed 3332.45 samples/sec Loss 11.7552 LearningRate 0.1455 Epoch: 3 Global Step: 18590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:09,068-Speed 3450.90 samples/sec Loss 11.8519 LearningRate 0.1455 Epoch: 3 Global Step: 18600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:12,000-Speed 3493.24 samples/sec Loss 12.0056 LearningRate 0.1455 Epoch: 3 Global Step: 18610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:14,934-Speed 3491.10 samples/sec Loss 11.8404 LearningRate 0.1454 Epoch: 3 Global Step: 18620 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:55:17,936-Speed 3412.17 samples/sec Loss 11.8176 LearningRate 0.1454 Epoch: 3 Global Step: 18630 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:55:20,912-Speed 3440.82 samples/sec Loss 11.7358 LearningRate 0.1454 Epoch: 3 Global Step: 18640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:55:23,843-Speed 3496.17 samples/sec Loss 11.6546 LearningRate 0.1454 Epoch: 3 Global Step: 18650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:55:26,804-Speed 3458.30 samples/sec Loss 11.7968 LearningRate 0.1453 Epoch: 3 Global Step: 18660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:55:29,759-Speed 3466.04 samples/sec Loss 11.6911 LearningRate 0.1453 Epoch: 3 Global Step: 18670 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:55:32,733-Speed 3445.40 samples/sec Loss 11.8810 LearningRate 0.1453 Epoch: 3 Global Step: 18680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:55:35,756-Speed 3387.14 samples/sec Loss 11.7583 LearningRate 0.1452 Epoch: 3 Global Step: 18690 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:55:38,704-Speed 3474.89 samples/sec Loss 11.7789 LearningRate 0.1452 Epoch: 3 Global Step: 18700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:41,637-Speed 3492.55 samples/sec Loss 11.8301 LearningRate 0.1452 Epoch: 3 Global Step: 18710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:44,577-Speed 3483.98 samples/sec Loss 11.8360 LearningRate 0.1452 Epoch: 3 Global Step: 18720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:47,521-Speed 3479.77 samples/sec Loss 11.7117 LearningRate 0.1451 Epoch: 3 Global Step: 18730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:50,458-Speed 3487.36 samples/sec Loss 11.6027 LearningRate 0.1451 Epoch: 3 Global Step: 18740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:53,415-Speed 3464.49 samples/sec Loss 11.8530 LearningRate 0.1451 Epoch: 3 Global Step: 18750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:56,368-Speed 3468.24 samples/sec Loss 11.6047 LearningRate 0.1451 Epoch: 3 Global Step: 18760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:55:59,326-Speed 3463.27 samples/sec Loss 11.6420 LearningRate 0.1450 Epoch: 3 Global Step: 18770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:56:02,247-Speed 3506.40 samples/sec Loss 11.6139 LearningRate 0.1450 Epoch: 3 Global Step: 18780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:56:05,193-Speed 3476.92 samples/sec Loss 11.6623 LearningRate 0.1450 Epoch: 3 Global Step: 18790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:56:08,131-Speed 3486.48 samples/sec Loss 11.6938 LearningRate 0.1450 Epoch: 3 Global Step: 18800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:56:11,064-Speed 3492.55 samples/sec Loss 11.6045 LearningRate 0.1449 Epoch: 3 Global Step: 18810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:56:14,012-Speed 3474.49 samples/sec Loss 11.9579 LearningRate 0.1449 Epoch: 3 Global Step: 18820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:56:16,947-Speed 3490.54 samples/sec Loss 11.5591 LearningRate 0.1449 Epoch: 3 Global Step: 18830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:56:19,875-Speed 3497.91 samples/sec Loss 11.6521 LearningRate 0.1448 Epoch: 3 Global Step: 18840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:56:22,817-Speed 3482.66 samples/sec Loss 11.6147 LearningRate 0.1448 Epoch: 3 Global Step: 18850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:56:25,832-Speed 3397.17 samples/sec Loss 11.6140 LearningRate 0.1448 Epoch: 3 Global Step: 18860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:56:28,762-Speed 3495.31 samples/sec Loss 11.8057 LearningRate 0.1448 Epoch: 3 Global Step: 18870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:56:31,697-Speed 3490.43 samples/sec Loss 11.6905 LearningRate 0.1447 Epoch: 3 Global Step: 18880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:56:34,629-Speed 3493.60 samples/sec Loss 11.6254 LearningRate 0.1447 Epoch: 3 Global Step: 18890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:56:37,567-Speed 3486.87 samples/sec Loss 11.7383 LearningRate 0.1447 Epoch: 3 Global Step: 18900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:56:40,500-Speed 3492.00 samples/sec Loss 11.6595 LearningRate 0.1447 Epoch: 3 Global Step: 18910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:56:43,446-Speed 3477.25 samples/sec Loss 11.8207 LearningRate 0.1446 Epoch: 3 Global Step: 18920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:56:46,378-Speed 3493.65 samples/sec Loss 11.6451 LearningRate 0.1446 Epoch: 3 Global Step: 18930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:56:49,312-Speed 3491.35 samples/sec Loss 11.6569 LearningRate 0.1446 Epoch: 3 Global Step: 18940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:56:52,268-Speed 3465.68 samples/sec Loss 11.4924 LearningRate 0.1445 Epoch: 3 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:56:55,220-Speed 3469.90 samples/sec Loss 11.7357 LearningRate 0.1445 Epoch: 3 Global Step: 18960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:56:58,238-Speed 3394.00 samples/sec Loss 11.8529 LearningRate 0.1445 Epoch: 3 Global Step: 18970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:01,205-Speed 3451.75 samples/sec Loss 11.5704 LearningRate 0.1445 Epoch: 3 Global Step: 18980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:04,154-Speed 3473.78 samples/sec Loss 11.4615 LearningRate 0.1444 Epoch: 3 Global Step: 18990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:07,089-Speed 3489.66 samples/sec Loss 11.5668 LearningRate 0.1444 Epoch: 3 Global Step: 19000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:10,046-Speed 3464.03 samples/sec Loss 11.5619 LearningRate 0.1444 Epoch: 3 Global Step: 19010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:12,984-Speed 3487.25 samples/sec Loss 11.6689 LearningRate 0.1444 Epoch: 3 Global Step: 19020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:15,922-Speed 3486.33 samples/sec Loss 11.7280 LearningRate 0.1443 Epoch: 3 Global Step: 19030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:18,910-Speed 3427.82 samples/sec Loss 11.7550 LearningRate 0.1443 Epoch: 3 Global Step: 19040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:21,860-Speed 3472.30 samples/sec Loss 11.6240 LearningRate 0.1443 Epoch: 3 Global Step: 19050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:24,794-Speed 3490.75 samples/sec Loss 11.7838 LearningRate 0.1443 Epoch: 3 Global Step: 19060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:27,757-Speed 3456.76 samples/sec Loss 11.6839 LearningRate 0.1442 Epoch: 3 Global Step: 19070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:30,692-Speed 3489.39 samples/sec Loss 11.6452 LearningRate 0.1442 Epoch: 3 Global Step: 19080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:33,643-Speed 3471.46 samples/sec Loss 11.6652 LearningRate 0.1442 Epoch: 3 Global Step: 19090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:36,577-Speed 3491.32 samples/sec Loss 11.6644 LearningRate 0.1441 Epoch: 3 Global Step: 19100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:57:39,534-Speed 3463.32 samples/sec Loss 11.6790 LearningRate 0.1441 Epoch: 3 Global Step: 19110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:57:42,472-Speed 3487.10 samples/sec Loss 11.6368 LearningRate 0.1441 Epoch: 3 Global Step: 19120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:57:45,411-Speed 3485.28 samples/sec Loss 11.5027 LearningRate 0.1441 Epoch: 3 Global Step: 19130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:57:48,342-Speed 3494.89 samples/sec Loss 11.7446 LearningRate 0.1440 Epoch: 3 Global Step: 19140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:57:51,277-Speed 3489.30 samples/sec Loss 11.4992 LearningRate 0.1440 Epoch: 3 Global Step: 19150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:57:54,209-Speed 3492.85 samples/sec Loss 11.5319 LearningRate 0.1440 Epoch: 3 Global Step: 19160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:57:57,155-Speed 3477.92 samples/sec Loss 11.5082 LearningRate 0.1440 Epoch: 3 Global Step: 19170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:58:00,089-Speed 3491.42 samples/sec Loss 11.7916 LearningRate 0.1439 Epoch: 3 Global Step: 19180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:58:03,020-Speed 3494.33 samples/sec Loss 11.7257 LearningRate 0.1439 Epoch: 3 Global Step: 19190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:58:05,951-Speed 3495.05 samples/sec Loss 11.9020 LearningRate 0.1439 Epoch: 3 Global Step: 19200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 18:58:08,884-Speed 3492.46 samples/sec Loss 11.6184 LearningRate 0.1438 Epoch: 3 Global Step: 19210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:11,824-Speed 3483.35 samples/sec Loss 11.6675 LearningRate 0.1438 Epoch: 3 Global Step: 19220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:14,793-Speed 3449.76 samples/sec Loss 11.7515 LearningRate 0.1438 Epoch: 3 Global Step: 19230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:17,722-Speed 3497.41 samples/sec Loss 11.7452 LearningRate 0.1438 Epoch: 3 Global Step: 19240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:20,650-Speed 3498.31 samples/sec Loss 11.6512 LearningRate 0.1437 Epoch: 3 Global Step: 19250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:23,584-Speed 3490.73 samples/sec Loss 11.6272 LearningRate 0.1437 Epoch: 3 Global Step: 19260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:26,528-Speed 3479.80 samples/sec Loss 11.6212 LearningRate 0.1437 Epoch: 3 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:29,459-Speed 3494.34 samples/sec Loss 11.5206 LearningRate 0.1437 Epoch: 3 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:32,398-Speed 3485.41 samples/sec Loss 11.7956 LearningRate 0.1436 Epoch: 3 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:35,336-Speed 3486.35 samples/sec Loss 11.5815 LearningRate 0.1436 Epoch: 3 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:38,287-Speed 3470.94 samples/sec Loss 11.5964 LearningRate 0.1436 Epoch: 3 Global Step: 19310 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:58:41,225-Speed 3486.19 samples/sec Loss 11.6222 LearningRate 0.1436 Epoch: 3 Global Step: 19320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:44,154-Speed 3497.20 samples/sec Loss 11.7801 LearningRate 0.1435 Epoch: 3 Global Step: 19330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:47,085-Speed 3495.28 samples/sec Loss 11.7402 LearningRate 0.1435 Epoch: 3 Global Step: 19340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:50,014-Speed 3496.47 samples/sec Loss 11.6831 LearningRate 0.1435 Epoch: 3 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:52,942-Speed 3498.69 samples/sec Loss 11.8647 LearningRate 0.1434 Epoch: 3 Global Step: 19360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:55,889-Speed 3474.98 samples/sec Loss 11.6434 LearningRate 0.1434 Epoch: 3 Global Step: 19370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:58:58,819-Speed 3495.92 samples/sec Loss 11.4248 LearningRate 0.1434 Epoch: 3 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:01,763-Speed 3478.91 samples/sec Loss 11.5158 LearningRate 0.1434 Epoch: 3 Global Step: 19390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:04,703-Speed 3485.51 samples/sec Loss 11.5772 LearningRate 0.1433 Epoch: 3 Global Step: 19400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:07,636-Speed 3491.62 samples/sec Loss 11.8175 LearningRate 0.1433 Epoch: 3 Global Step: 19410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:10,572-Speed 3487.98 samples/sec Loss 11.6281 LearningRate 0.1433 Epoch: 3 Global Step: 19420 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:59:13,504-Speed 3493.84 samples/sec Loss 11.6096 LearningRate 0.1433 Epoch: 3 Global Step: 19430 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:59:16,440-Speed 3489.02 samples/sec Loss 11.5019 LearningRate 0.1432 Epoch: 3 Global Step: 19440 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:59:19,373-Speed 3492.80 samples/sec Loss 11.3661 LearningRate 0.1432 Epoch: 3 Global Step: 19450 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:59:22,317-Speed 3478.68 samples/sec Loss 11.6232 LearningRate 0.1432 Epoch: 3 Global Step: 19460 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 18:59:25,242-Speed 3501.94 samples/sec Loss 11.7662 LearningRate 0.1432 Epoch: 3 Global Step: 19470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:28,179-Speed 3487.08 samples/sec Loss 11.5928 LearningRate 0.1431 Epoch: 3 Global Step: 19480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:31,116-Speed 3487.20 samples/sec Loss 11.5773 LearningRate 0.1431 Epoch: 3 Global Step: 19490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:34,054-Speed 3486.98 samples/sec Loss 11.6445 LearningRate 0.1431 Epoch: 3 Global Step: 19500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:37,020-Speed 3453.37 samples/sec Loss 11.6340 LearningRate 0.1430 Epoch: 3 Global Step: 19510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:39,956-Speed 3488.57 samples/sec Loss 11.5004 LearningRate 0.1430 Epoch: 3 Global Step: 19520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:42,899-Speed 3481.05 samples/sec Loss 11.3371 LearningRate 0.1430 Epoch: 3 Global Step: 19530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:45,867-Speed 3450.49 samples/sec Loss 11.6126 LearningRate 0.1430 Epoch: 3 Global Step: 19540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:48,798-Speed 3495.38 samples/sec Loss 11.4980 LearningRate 0.1429 Epoch: 3 Global Step: 19550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:51,795-Speed 3417.54 samples/sec Loss 11.5364 LearningRate 0.1429 Epoch: 3 Global Step: 19560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:54,730-Speed 3490.45 samples/sec Loss 11.5030 LearningRate 0.1429 Epoch: 3 Global Step: 19570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 18:59:57,664-Speed 3490.39 samples/sec Loss 11.5930 LearningRate 0.1429 Epoch: 3 Global Step: 19580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:00:00,605-Speed 3483.42 samples/sec Loss 11.5618 LearningRate 0.1428 Epoch: 3 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:00:03,540-Speed 3490.82 samples/sec Loss 11.5757 LearningRate 0.1428 Epoch: 3 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:00:06,478-Speed 3486.11 samples/sec Loss 11.6155 LearningRate 0.1428 Epoch: 3 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:00:09,410-Speed 3495.16 samples/sec Loss 11.5227 LearningRate 0.1428 Epoch: 3 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:00:12,348-Speed 3485.68 samples/sec Loss 11.6483 LearningRate 0.1427 Epoch: 3 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:00:15,284-Speed 3489.22 samples/sec Loss 11.4799 LearningRate 0.1427 Epoch: 3 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:00:18,232-Speed 3474.78 samples/sec Loss 11.5475 LearningRate 0.1427 Epoch: 3 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:00:21,238-Speed 3407.37 samples/sec Loss 11.4249 LearningRate 0.1426 Epoch: 3 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:00:24,179-Speed 3483.41 samples/sec Loss 11.6862 LearningRate 0.1426 Epoch: 3 Global Step: 19670 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:00:27,144-Speed 3454.75 samples/sec Loss 11.6395 LearningRate 0.1426 Epoch: 3 Global Step: 19680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:00:30,073-Speed 3496.44 samples/sec Loss 11.5052 LearningRate 0.1426 Epoch: 3 Global Step: 19690 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:00:33,080-Speed 3406.93 samples/sec Loss 11.5336 LearningRate 0.1425 Epoch: 3 Global Step: 19700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:00:36,041-Speed 3458.45 samples/sec Loss 11.6717 LearningRate 0.1425 Epoch: 3 Global Step: 19710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:00:38,985-Speed 3479.78 samples/sec Loss 11.4888 LearningRate 0.1425 Epoch: 3 Global Step: 19720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:00:41,920-Speed 3489.82 samples/sec Loss 11.6017 LearningRate 0.1425 Epoch: 3 Global Step: 19730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:00:44,894-Speed 3445.11 samples/sec Loss 11.5636 LearningRate 0.1424 Epoch: 3 Global Step: 19740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:00:48,153-Speed 3143.31 samples/sec Loss 11.4313 LearningRate 0.1424 Epoch: 3 Global Step: 19750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:00:51,094-Speed 3481.90 samples/sec Loss 11.4819 LearningRate 0.1424 Epoch: 3 Global Step: 19760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:00:54,043-Speed 3474.17 samples/sec Loss 11.5636 LearningRate 0.1424 Epoch: 3 Global Step: 19770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:00:56,976-Speed 3491.50 samples/sec Loss 11.5703 LearningRate 0.1423 Epoch: 3 Global Step: 19780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:00:59,935-Speed 3462.20 samples/sec Loss 11.6414 LearningRate 0.1423 Epoch: 3 Global Step: 19790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:01:02,880-Speed 3478.07 samples/sec Loss 11.3754 LearningRate 0.1423 Epoch: 3 Global Step: 19800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:05,820-Speed 3483.42 samples/sec Loss 11.4794 LearningRate 0.1422 Epoch: 3 Global Step: 19810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:08,757-Speed 3487.86 samples/sec Loss 11.7405 LearningRate 0.1422 Epoch: 3 Global Step: 19820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:11,692-Speed 3488.96 samples/sec Loss 11.6244 LearningRate 0.1422 Epoch: 3 Global Step: 19830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:14,638-Speed 3477.30 samples/sec Loss 11.4435 LearningRate 0.1422 Epoch: 3 Global Step: 19840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:17,587-Speed 3473.41 samples/sec Loss 11.4890 LearningRate 0.1421 Epoch: 3 Global Step: 19850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:20,540-Speed 3468.70 samples/sec Loss 11.4663 LearningRate 0.1421 Epoch: 3 Global Step: 19860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:23,498-Speed 3462.47 samples/sec Loss 11.4760 LearningRate 0.1421 Epoch: 3 Global Step: 19870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:26,439-Speed 3482.98 samples/sec Loss 11.5500 LearningRate 0.1421 Epoch: 3 Global Step: 19880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:29,381-Speed 3482.30 samples/sec Loss 11.6779 LearningRate 0.1420 Epoch: 3 Global Step: 19890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:32,324-Speed 3480.03 samples/sec Loss 11.4459 LearningRate 0.1420 Epoch: 3 Global Step: 19900 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:01:35,259-Speed 3489.36 samples/sec Loss 11.4899 LearningRate 0.1420 Epoch: 3 Global Step: 19910 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:01:38,203-Speed 3490.04 samples/sec Loss 11.5740 LearningRate 0.1420 Epoch: 3 Global Step: 19920 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:01:41,153-Speed 3472.07 samples/sec Loss 11.6389 LearningRate 0.1419 Epoch: 3 Global Step: 19930 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:01:44,082-Speed 3497.19 samples/sec Loss 11.4121 LearningRate 0.1419 Epoch: 3 Global Step: 19940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:47,115-Speed 3377.04 samples/sec Loss 11.3235 LearningRate 0.1419 Epoch: 3 Global Step: 19950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:50,049-Speed 3490.84 samples/sec Loss 11.4788 LearningRate 0.1418 Epoch: 3 Global Step: 19960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:53,019-Speed 3448.58 samples/sec Loss 11.3452 LearningRate 0.1418 Epoch: 3 Global Step: 19970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:56,015-Speed 3419.96 samples/sec Loss 11.6276 LearningRate 0.1418 Epoch: 3 Global Step: 19980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:01:58,976-Speed 3459.52 samples/sec Loss 11.6276 LearningRate 0.1418 Epoch: 3 Global Step: 19990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:02:01,967-Speed 3424.16 samples/sec Loss 11.5452 LearningRate 0.1417 Epoch: 3 Global Step: 20000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:02:45,143-[lfw][20000]XNorm: 20.996162 Training: 2022-01-19 19:02:45,144-[lfw][20000]Accuracy-Flip: 0.99550+-0.00380 Training: 2022-01-19 19:02:45,144-[lfw][20000]Accuracy-Highest: 0.99583 Training: 2022-01-19 19:03:35,277-[cfp_fp][20000]XNorm: 18.036152 Training: 2022-01-19 19:03:35,291-[cfp_fp][20000]Accuracy-Flip: 0.93714+-0.01244 Training: 2022-01-19 19:03:35,291-[cfp_fp][20000]Accuracy-Highest: 0.94429 Training: 2022-01-19 19:04:18,368-[agedb_30][20000]XNorm: 20.267056 Training: 2022-01-19 19:04:18,368-[agedb_30][20000]Accuracy-Flip: 0.96783+-0.00857 Training: 2022-01-19 19:04:18,369-[agedb_30][20000]Accuracy-Highest: 0.96783 Training: 2022-01-19 19:04:21,325-Speed 73.48 samples/sec Loss 11.4648 LearningRate 0.1417 Epoch: 3 Global Step: 20010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:04:24,285-Speed 3459.90 samples/sec Loss 11.4229 LearningRate 0.1417 Epoch: 3 Global Step: 20020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:04:27,237-Speed 3470.17 samples/sec Loss 11.5719 LearningRate 0.1417 Epoch: 3 Global Step: 20030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:04:30,179-Speed 3482.07 samples/sec Loss 11.5529 LearningRate 0.1416 Epoch: 3 Global Step: 20040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:04:33,163-Speed 3432.11 samples/sec Loss 11.3735 LearningRate 0.1416 Epoch: 3 Global Step: 20050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:04:36,102-Speed 3485.25 samples/sec Loss 11.6193 LearningRate 0.1416 Epoch: 3 Global Step: 20060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:04:39,059-Speed 3464.02 samples/sec Loss 11.6190 LearningRate 0.1416 Epoch: 3 Global Step: 20070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:04:42,009-Speed 3471.69 samples/sec Loss 11.4628 LearningRate 0.1415 Epoch: 3 Global Step: 20080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:04:44,940-Speed 3495.29 samples/sec Loss 11.4319 LearningRate 0.1415 Epoch: 3 Global Step: 20090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:04:47,871-Speed 3494.80 samples/sec Loss 11.4154 LearningRate 0.1415 Epoch: 3 Global Step: 20100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:04:50,830-Speed 3462.03 samples/sec Loss 11.4757 LearningRate 0.1414 Epoch: 3 Global Step: 20110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:04:53,757-Speed 3498.97 samples/sec Loss 11.2766 LearningRate 0.1414 Epoch: 3 Global Step: 20120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:04:56,699-Speed 3481.99 samples/sec Loss 11.6195 LearningRate 0.1414 Epoch: 3 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:04:59,647-Speed 3474.63 samples/sec Loss 11.4392 LearningRate 0.1414 Epoch: 3 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:02,665-Speed 3392.79 samples/sec Loss 11.4812 LearningRate 0.1413 Epoch: 3 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:05,603-Speed 3487.76 samples/sec Loss 11.3763 LearningRate 0.1413 Epoch: 3 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:08,536-Speed 3491.81 samples/sec Loss 11.5737 LearningRate 0.1413 Epoch: 3 Global Step: 20170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:11,471-Speed 3489.02 samples/sec Loss 11.7051 LearningRate 0.1413 Epoch: 3 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:14,395-Speed 3503.63 samples/sec Loss 11.4515 LearningRate 0.1412 Epoch: 3 Global Step: 20190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:17,342-Speed 3475.15 samples/sec Loss 11.5614 LearningRate 0.1412 Epoch: 3 Global Step: 20200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:20,367-Speed 3386.70 samples/sec Loss 11.3508 LearningRate 0.1412 Epoch: 3 Global Step: 20210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:23,293-Speed 3501.24 samples/sec Loss 11.5981 LearningRate 0.1412 Epoch: 3 Global Step: 20220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:26,289-Speed 3418.85 samples/sec Loss 11.3421 LearningRate 0.1411 Epoch: 3 Global Step: 20230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:40,045-Speed 744.47 samples/sec Loss 10.8436 LearningRate 0.1411 Epoch: 4 Global Step: 20240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:43,143-Speed 3306.08 samples/sec Loss 10.7430 LearningRate 0.1411 Epoch: 4 Global Step: 20250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:46,198-Speed 3352.51 samples/sec Loss 10.6546 LearningRate 0.1410 Epoch: 4 Global Step: 20260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:49,157-Speed 3462.63 samples/sec Loss 11.0239 LearningRate 0.1410 Epoch: 4 Global Step: 20270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:52,111-Speed 3467.06 samples/sec Loss 10.7530 LearningRate 0.1410 Epoch: 4 Global Step: 20280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:55,039-Speed 3498.16 samples/sec Loss 10.8319 LearningRate 0.1410 Epoch: 4 Global Step: 20290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:05:58,061-Speed 3389.27 samples/sec Loss 10.6973 LearningRate 0.1409 Epoch: 4 Global Step: 20300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:00,994-Speed 3493.26 samples/sec Loss 10.6847 LearningRate 0.1409 Epoch: 4 Global Step: 20310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:03,931-Speed 3487.07 samples/sec Loss 10.9123 LearningRate 0.1409 Epoch: 4 Global Step: 20320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:06,874-Speed 3480.82 samples/sec Loss 10.8746 LearningRate 0.1409 Epoch: 4 Global Step: 20330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:09,813-Speed 3484.91 samples/sec Loss 10.8305 LearningRate 0.1408 Epoch: 4 Global Step: 20340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:12,751-Speed 3485.40 samples/sec Loss 10.9463 LearningRate 0.1408 Epoch: 4 Global Step: 20350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:15,773-Speed 3390.49 samples/sec Loss 11.0600 LearningRate 0.1408 Epoch: 4 Global Step: 20360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:18,724-Speed 3470.06 samples/sec Loss 11.0581 LearningRate 0.1408 Epoch: 4 Global Step: 20370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:21,662-Speed 3487.14 samples/sec Loss 10.9401 LearningRate 0.1407 Epoch: 4 Global Step: 20380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:24,633-Speed 3447.23 samples/sec Loss 11.0543 LearningRate 0.1407 Epoch: 4 Global Step: 20390 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:06:27,610-Speed 3440.50 samples/sec Loss 10.9133 LearningRate 0.1407 Epoch: 4 Global Step: 20400 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:06:30,543-Speed 3492.43 samples/sec Loss 10.9128 LearningRate 0.1406 Epoch: 4 Global Step: 20410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:33,501-Speed 3463.25 samples/sec Loss 11.0508 LearningRate 0.1406 Epoch: 4 Global Step: 20420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:36,455-Speed 3467.48 samples/sec Loss 11.0595 LearningRate 0.1406 Epoch: 4 Global Step: 20430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:39,407-Speed 3470.21 samples/sec Loss 11.0735 LearningRate 0.1406 Epoch: 4 Global Step: 20440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:42,355-Speed 3474.58 samples/sec Loss 11.0106 LearningRate 0.1405 Epoch: 4 Global Step: 20450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:45,296-Speed 3482.06 samples/sec Loss 10.9189 LearningRate 0.1405 Epoch: 4 Global Step: 20460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:48,237-Speed 3482.91 samples/sec Loss 10.9534 LearningRate 0.1405 Epoch: 4 Global Step: 20470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:51,189-Speed 3471.18 samples/sec Loss 11.0867 LearningRate 0.1405 Epoch: 4 Global Step: 20480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:54,138-Speed 3473.71 samples/sec Loss 11.0931 LearningRate 0.1404 Epoch: 4 Global Step: 20490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:06:57,077-Speed 3484.80 samples/sec Loss 10.9582 LearningRate 0.1404 Epoch: 4 Global Step: 20500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:07:00,032-Speed 3466.20 samples/sec Loss 11.0898 LearningRate 0.1404 Epoch: 4 Global Step: 20510 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:07:02,968-Speed 3488.24 samples/sec Loss 11.1400 LearningRate 0.1404 Epoch: 4 Global Step: 20520 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:07:05,895-Speed 3500.51 samples/sec Loss 11.2228 LearningRate 0.1403 Epoch: 4 Global Step: 20530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:07:08,828-Speed 3491.21 samples/sec Loss 11.1032 LearningRate 0.1403 Epoch: 4 Global Step: 20540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:07:11,760-Speed 3494.66 samples/sec Loss 11.0717 LearningRate 0.1403 Epoch: 4 Global Step: 20550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:07:14,692-Speed 3492.77 samples/sec Loss 11.1621 LearningRate 0.1402 Epoch: 4 Global Step: 20560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:07:17,624-Speed 3493.91 samples/sec Loss 11.2119 LearningRate 0.1402 Epoch: 4 Global Step: 20570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:07:20,586-Speed 3458.51 samples/sec Loss 11.3385 LearningRate 0.1402 Epoch: 4 Global Step: 20580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:07:23,550-Speed 3455.25 samples/sec Loss 11.2484 LearningRate 0.1402 Epoch: 4 Global Step: 20590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:07:26,533-Speed 3433.72 samples/sec Loss 11.2443 LearningRate 0.1401 Epoch: 4 Global Step: 20600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:07:29,462-Speed 3497.82 samples/sec Loss 11.2773 LearningRate 0.1401 Epoch: 4 Global Step: 20610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:07:32,405-Speed 3480.80 samples/sec Loss 11.2415 LearningRate 0.1401 Epoch: 4 Global Step: 20620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:07:35,351-Speed 3476.57 samples/sec Loss 11.0377 LearningRate 0.1401 Epoch: 4 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:07:38,282-Speed 3495.32 samples/sec Loss 11.1364 LearningRate 0.1400 Epoch: 4 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:07:41,232-Speed 3472.40 samples/sec Loss 11.1898 LearningRate 0.1400 Epoch: 4 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:07:44,167-Speed 3488.76 samples/sec Loss 11.0097 LearningRate 0.1400 Epoch: 4 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:07:47,099-Speed 3493.46 samples/sec Loss 11.2932 LearningRate 0.1400 Epoch: 4 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:07:50,033-Speed 3491.23 samples/sec Loss 11.4410 LearningRate 0.1399 Epoch: 4 Global Step: 20680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-19 19:07:52,979-Speed 3476.84 samples/sec Loss 11.2966 LearningRate 0.1399 Epoch: 4 Global Step: 20690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:07:55,916-Speed 3487.67 samples/sec Loss 11.2923 LearningRate 0.1399 Epoch: 4 Global Step: 20700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:07:58,850-Speed 3490.85 samples/sec Loss 11.2598 LearningRate 0.1399 Epoch: 4 Global Step: 20710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:01,799-Speed 3473.36 samples/sec Loss 11.2415 LearningRate 0.1398 Epoch: 4 Global Step: 20720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:04,764-Speed 3454.60 samples/sec Loss 11.0973 LearningRate 0.1398 Epoch: 4 Global Step: 20730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:07,700-Speed 3489.03 samples/sec Loss 11.4317 LearningRate 0.1398 Epoch: 4 Global Step: 20740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:10,651-Speed 3470.74 samples/sec Loss 11.2437 LearningRate 0.1397 Epoch: 4 Global Step: 20750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:13,582-Speed 3494.46 samples/sec Loss 11.2739 LearningRate 0.1397 Epoch: 4 Global Step: 20760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:16,596-Speed 3409.71 samples/sec Loss 11.2219 LearningRate 0.1397 Epoch: 4 Global Step: 20770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:19,570-Speed 3444.14 samples/sec Loss 11.3482 LearningRate 0.1397 Epoch: 4 Global Step: 20780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:22,543-Speed 3445.63 samples/sec Loss 11.3458 LearningRate 0.1396 Epoch: 4 Global Step: 20790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:25,633-Speed 3315.22 samples/sec Loss 11.2408 LearningRate 0.1396 Epoch: 4 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:28,695-Speed 3344.85 samples/sec Loss 11.2679 LearningRate 0.1396 Epoch: 4 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:31,636-Speed 3483.76 samples/sec Loss 11.1501 LearningRate 0.1396 Epoch: 4 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:34,571-Speed 3489.33 samples/sec Loss 11.2376 LearningRate 0.1395 Epoch: 4 Global Step: 20830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:37,552-Speed 3436.16 samples/sec Loss 11.2162 LearningRate 0.1395 Epoch: 4 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:40,503-Speed 3471.19 samples/sec Loss 11.1915 LearningRate 0.1395 Epoch: 4 Global Step: 20850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:43,439-Speed 3488.35 samples/sec Loss 11.1871 LearningRate 0.1395 Epoch: 4 Global Step: 20860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:46,371-Speed 3493.18 samples/sec Loss 11.3630 LearningRate 0.1394 Epoch: 4 Global Step: 20870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:49,317-Speed 3477.00 samples/sec Loss 11.3091 LearningRate 0.1394 Epoch: 4 Global Step: 20880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:52,272-Speed 3465.70 samples/sec Loss 11.3246 LearningRate 0.1394 Epoch: 4 Global Step: 20890 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:08:55,203-Speed 3495.08 samples/sec Loss 11.2930 LearningRate 0.1394 Epoch: 4 Global Step: 20900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:08:58,135-Speed 3493.44 samples/sec Loss 11.2298 LearningRate 0.1393 Epoch: 4 Global Step: 20910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:01,093-Speed 3463.68 samples/sec Loss 11.1204 LearningRate 0.1393 Epoch: 4 Global Step: 20920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:04,026-Speed 3491.80 samples/sec Loss 11.3663 LearningRate 0.1393 Epoch: 4 Global Step: 20930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:06,957-Speed 3494.69 samples/sec Loss 11.3613 LearningRate 0.1392 Epoch: 4 Global Step: 20940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:09,895-Speed 3487.33 samples/sec Loss 11.1357 LearningRate 0.1392 Epoch: 4 Global Step: 20950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:12,858-Speed 3456.09 samples/sec Loss 11.1384 LearningRate 0.1392 Epoch: 4 Global Step: 20960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:15,809-Speed 3470.77 samples/sec Loss 11.4999 LearningRate 0.1392 Epoch: 4 Global Step: 20970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:18,744-Speed 3490.73 samples/sec Loss 11.2328 LearningRate 0.1391 Epoch: 4 Global Step: 20980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:21,712-Speed 3451.22 samples/sec Loss 11.3106 LearningRate 0.1391 Epoch: 4 Global Step: 20990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:24,696-Speed 3431.44 samples/sec Loss 11.2125 LearningRate 0.1391 Epoch: 4 Global Step: 21000 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:09:27,717-Speed 3391.13 samples/sec Loss 11.2999 LearningRate 0.1391 Epoch: 4 Global Step: 21010 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:09:30,645-Speed 3498.31 samples/sec Loss 11.2124 LearningRate 0.1390 Epoch: 4 Global Step: 21020 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:09:33,566-Speed 3506.22 samples/sec Loss 11.3187 LearningRate 0.1390 Epoch: 4 Global Step: 21030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:36,502-Speed 3489.48 samples/sec Loss 11.2832 LearningRate 0.1390 Epoch: 4 Global Step: 21040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:39,495-Speed 3422.45 samples/sec Loss 11.4368 LearningRate 0.1390 Epoch: 4 Global Step: 21050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:42,425-Speed 3495.32 samples/sec Loss 11.2668 LearningRate 0.1389 Epoch: 4 Global Step: 21060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:45,370-Speed 3478.95 samples/sec Loss 11.2168 LearningRate 0.1389 Epoch: 4 Global Step: 21070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:48,300-Speed 3495.01 samples/sec Loss 11.5004 LearningRate 0.1389 Epoch: 4 Global Step: 21080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:51,240-Speed 3484.70 samples/sec Loss 11.1402 LearningRate 0.1388 Epoch: 4 Global Step: 21090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:54,210-Speed 3448.02 samples/sec Loss 11.4722 LearningRate 0.1388 Epoch: 4 Global Step: 21100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:09:57,143-Speed 3492.84 samples/sec Loss 11.1994 LearningRate 0.1388 Epoch: 4 Global Step: 21110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:00,101-Speed 3463.35 samples/sec Loss 11.3729 LearningRate 0.1388 Epoch: 4 Global Step: 21120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:03,060-Speed 3461.50 samples/sec Loss 11.2990 LearningRate 0.1387 Epoch: 4 Global Step: 21130 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:10:05,991-Speed 3494.16 samples/sec Loss 11.2294 LearningRate 0.1387 Epoch: 4 Global Step: 21140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:08,941-Speed 3471.44 samples/sec Loss 11.3666 LearningRate 0.1387 Epoch: 4 Global Step: 21150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:11,883-Speed 3482.56 samples/sec Loss 11.3661 LearningRate 0.1387 Epoch: 4 Global Step: 21160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:14,821-Speed 3485.60 samples/sec Loss 11.4191 LearningRate 0.1386 Epoch: 4 Global Step: 21170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:17,771-Speed 3472.61 samples/sec Loss 11.2566 LearningRate 0.1386 Epoch: 4 Global Step: 21180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:20,707-Speed 3488.08 samples/sec Loss 11.4612 LearningRate 0.1386 Epoch: 4 Global Step: 21190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:23,659-Speed 3470.04 samples/sec Loss 11.2788 LearningRate 0.1386 Epoch: 4 Global Step: 21200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:26,597-Speed 3486.49 samples/sec Loss 11.4685 LearningRate 0.1385 Epoch: 4 Global Step: 21210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:29,541-Speed 3478.58 samples/sec Loss 11.2729 LearningRate 0.1385 Epoch: 4 Global Step: 21220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:32,502-Speed 3459.00 samples/sec Loss 11.3260 LearningRate 0.1385 Epoch: 4 Global Step: 21230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:35,455-Speed 3469.38 samples/sec Loss 11.5412 LearningRate 0.1385 Epoch: 4 Global Step: 21240 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:10:38,418-Speed 3456.04 samples/sec Loss 11.3090 LearningRate 0.1384 Epoch: 4 Global Step: 21250 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:10:41,338-Speed 3508.39 samples/sec Loss 11.3940 LearningRate 0.1384 Epoch: 4 Global Step: 21260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:44,306-Speed 3451.59 samples/sec Loss 11.2296 LearningRate 0.1384 Epoch: 4 Global Step: 21270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:47,295-Speed 3427.53 samples/sec Loss 11.2784 LearningRate 0.1383 Epoch: 4 Global Step: 21280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:50,241-Speed 3476.71 samples/sec Loss 11.1313 LearningRate 0.1383 Epoch: 4 Global Step: 21290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:53,302-Speed 3346.33 samples/sec Loss 11.2027 LearningRate 0.1383 Epoch: 4 Global Step: 21300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:56,271-Speed 3449.39 samples/sec Loss 11.2794 LearningRate 0.1383 Epoch: 4 Global Step: 21310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:10:59,231-Speed 3460.36 samples/sec Loss 11.5038 LearningRate 0.1382 Epoch: 4 Global Step: 21320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:02,180-Speed 3473.54 samples/sec Loss 11.4440 LearningRate 0.1382 Epoch: 4 Global Step: 21330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:05,112-Speed 3493.26 samples/sec Loss 11.3757 LearningRate 0.1382 Epoch: 4 Global Step: 21340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:08,054-Speed 3482.28 samples/sec Loss 11.4011 LearningRate 0.1382 Epoch: 4 Global Step: 21350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:10,998-Speed 3479.31 samples/sec Loss 11.3173 LearningRate 0.1381 Epoch: 4 Global Step: 21360 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:11:13,922-Speed 3502.39 samples/sec Loss 11.1331 LearningRate 0.1381 Epoch: 4 Global Step: 21370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:16,853-Speed 3494.22 samples/sec Loss 11.3908 LearningRate 0.1381 Epoch: 4 Global Step: 21380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:19,791-Speed 3486.76 samples/sec Loss 11.3462 LearningRate 0.1381 Epoch: 4 Global Step: 21390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:22,726-Speed 3490.04 samples/sec Loss 11.2569 LearningRate 0.1380 Epoch: 4 Global Step: 21400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:25,670-Speed 3479.31 samples/sec Loss 11.3082 LearningRate 0.1380 Epoch: 4 Global Step: 21410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:28,603-Speed 3491.43 samples/sec Loss 11.3566 LearningRate 0.1380 Epoch: 4 Global Step: 21420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:31,586-Speed 3433.87 samples/sec Loss 11.1107 LearningRate 0.1380 Epoch: 4 Global Step: 21430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:34,549-Speed 3457.25 samples/sec Loss 11.0822 LearningRate 0.1379 Epoch: 4 Global Step: 21440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:37,496-Speed 3476.51 samples/sec Loss 11.4323 LearningRate 0.1379 Epoch: 4 Global Step: 21450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:40,439-Speed 3480.37 samples/sec Loss 11.4236 LearningRate 0.1379 Epoch: 4 Global Step: 21460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:43,404-Speed 3453.94 samples/sec Loss 11.4251 LearningRate 0.1378 Epoch: 4 Global Step: 21470 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:11:46,409-Speed 3409.18 samples/sec Loss 11.2534 LearningRate 0.1378 Epoch: 4 Global Step: 21480 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:11:49,349-Speed 3484.10 samples/sec Loss 11.2120 LearningRate 0.1378 Epoch: 4 Global Step: 21490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:52,282-Speed 3492.13 samples/sec Loss 11.4299 LearningRate 0.1378 Epoch: 4 Global Step: 21500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:55,214-Speed 3493.86 samples/sec Loss 11.2881 LearningRate 0.1377 Epoch: 4 Global Step: 21510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:11:58,161-Speed 3475.54 samples/sec Loss 11.2440 LearningRate 0.1377 Epoch: 4 Global Step: 21520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:12:01,109-Speed 3474.39 samples/sec Loss 11.2314 LearningRate 0.1377 Epoch: 4 Global Step: 21530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:12:04,049-Speed 3483.69 samples/sec Loss 10.9805 LearningRate 0.1377 Epoch: 4 Global Step: 21540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:12:06,992-Speed 3479.66 samples/sec Loss 11.2939 LearningRate 0.1376 Epoch: 4 Global Step: 21550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:12:09,939-Speed 3475.91 samples/sec Loss 11.3099 LearningRate 0.1376 Epoch: 4 Global Step: 21560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:12:12,878-Speed 3485.18 samples/sec Loss 11.4921 LearningRate 0.1376 Epoch: 4 Global Step: 21570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:12:15,834-Speed 3465.38 samples/sec Loss 11.2871 LearningRate 0.1376 Epoch: 4 Global Step: 21580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:12:18,755-Speed 3507.26 samples/sec Loss 11.2158 LearningRate 0.1375 Epoch: 4 Global Step: 21590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:12:21,689-Speed 3490.30 samples/sec Loss 11.3469 LearningRate 0.1375 Epoch: 4 Global Step: 21600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:12:24,620-Speed 3495.47 samples/sec Loss 11.2023 LearningRate 0.1375 Epoch: 4 Global Step: 21610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:12:27,552-Speed 3492.71 samples/sec Loss 11.2624 LearningRate 0.1375 Epoch: 4 Global Step: 21620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:12:30,514-Speed 3458.33 samples/sec Loss 11.3269 LearningRate 0.1374 Epoch: 4 Global Step: 21630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:12:33,477-Speed 3456.45 samples/sec Loss 11.1955 LearningRate 0.1374 Epoch: 4 Global Step: 21640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:12:36,414-Speed 3487.05 samples/sec Loss 11.1449 LearningRate 0.1374 Epoch: 4 Global Step: 21650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:12:39,359-Speed 3479.49 samples/sec Loss 11.1855 LearningRate 0.1374 Epoch: 4 Global Step: 21660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:12:42,294-Speed 3489.55 samples/sec Loss 11.3670 LearningRate 0.1373 Epoch: 4 Global Step: 21670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:12:45,312-Speed 3393.91 samples/sec Loss 11.3066 LearningRate 0.1373 Epoch: 4 Global Step: 21680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:12:48,281-Speed 3449.72 samples/sec Loss 11.2268 LearningRate 0.1373 Epoch: 4 Global Step: 21690 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:12:51,223-Speed 3481.30 samples/sec Loss 11.2435 LearningRate 0.1372 Epoch: 4 Global Step: 21700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:12:54,162-Speed 3485.60 samples/sec Loss 11.3571 LearningRate 0.1372 Epoch: 4 Global Step: 21710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:12:57,130-Speed 3451.79 samples/sec Loss 11.2379 LearningRate 0.1372 Epoch: 4 Global Step: 21720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:00,114-Speed 3432.12 samples/sec Loss 11.3399 LearningRate 0.1372 Epoch: 4 Global Step: 21730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:03,118-Speed 3409.64 samples/sec Loss 11.2618 LearningRate 0.1371 Epoch: 4 Global Step: 21740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:06,183-Speed 3341.21 samples/sec Loss 11.1585 LearningRate 0.1371 Epoch: 4 Global Step: 21750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:09,160-Speed 3440.66 samples/sec Loss 11.3346 LearningRate 0.1371 Epoch: 4 Global Step: 21760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:12,099-Speed 3485.85 samples/sec Loss 11.3497 LearningRate 0.1371 Epoch: 4 Global Step: 21770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:15,053-Speed 3466.95 samples/sec Loss 11.1547 LearningRate 0.1370 Epoch: 4 Global Step: 21780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:17,992-Speed 3485.89 samples/sec Loss 11.1762 LearningRate 0.1370 Epoch: 4 Global Step: 21790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:20,930-Speed 3486.23 samples/sec Loss 11.2543 LearningRate 0.1370 Epoch: 4 Global Step: 21800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:23,918-Speed 3429.02 samples/sec Loss 11.2908 LearningRate 0.1370 Epoch: 4 Global Step: 21810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:26,862-Speed 3478.47 samples/sec Loss 11.0797 LearningRate 0.1369 Epoch: 4 Global Step: 21820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:29,824-Speed 3458.36 samples/sec Loss 11.3656 LearningRate 0.1369 Epoch: 4 Global Step: 21830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:32,800-Speed 3441.83 samples/sec Loss 11.1886 LearningRate 0.1369 Epoch: 4 Global Step: 21840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:35,764-Speed 3454.83 samples/sec Loss 11.1333 LearningRate 0.1369 Epoch: 4 Global Step: 21850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:13:38,703-Speed 3485.22 samples/sec Loss 11.2657 LearningRate 0.1368 Epoch: 4 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:13:41,646-Speed 3481.51 samples/sec Loss 11.3027 LearningRate 0.1368 Epoch: 4 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:13:44,585-Speed 3484.24 samples/sec Loss 11.2622 LearningRate 0.1368 Epoch: 4 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:13:47,517-Speed 3493.67 samples/sec Loss 11.1352 LearningRate 0.1367 Epoch: 4 Global Step: 21890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:13:50,454-Speed 3487.66 samples/sec Loss 11.1816 LearningRate 0.1367 Epoch: 4 Global Step: 21900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:13:53,393-Speed 3484.97 samples/sec Loss 11.2240 LearningRate 0.1367 Epoch: 4 Global Step: 21910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:13:56,329-Speed 3488.68 samples/sec Loss 11.2339 LearningRate 0.1367 Epoch: 4 Global Step: 21920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:13:59,272-Speed 3479.95 samples/sec Loss 11.2184 LearningRate 0.1366 Epoch: 4 Global Step: 21930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:14:02,207-Speed 3489.79 samples/sec Loss 11.2475 LearningRate 0.1366 Epoch: 4 Global Step: 21940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:14:05,236-Speed 3381.27 samples/sec Loss 11.0781 LearningRate 0.1366 Epoch: 4 Global Step: 21950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:14:08,174-Speed 3486.29 samples/sec Loss 11.2520 LearningRate 0.1366 Epoch: 4 Global Step: 21960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:14:11,115-Speed 3483.99 samples/sec Loss 11.0608 LearningRate 0.1365 Epoch: 4 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:14:14,103-Speed 3427.39 samples/sec Loss 11.1693 LearningRate 0.1365 Epoch: 4 Global Step: 21980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:14:17,132-Speed 3381.64 samples/sec Loss 11.2219 LearningRate 0.1365 Epoch: 4 Global Step: 21990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:14:20,084-Speed 3469.57 samples/sec Loss 11.1619 LearningRate 0.1365 Epoch: 4 Global Step: 22000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:15:02,938-[lfw][22000]XNorm: 22.271905 Training: 2022-01-19 19:15:02,938-[lfw][22000]Accuracy-Flip: 0.99700+-0.00245 Training: 2022-01-19 19:15:02,939-[lfw][22000]Accuracy-Highest: 0.99700 Training: 2022-01-19 19:15:52,970-[cfp_fp][22000]XNorm: 19.202346 Training: 2022-01-19 19:15:52,971-[cfp_fp][22000]Accuracy-Flip: 0.94629+-0.01354 Training: 2022-01-19 19:15:52,971-[cfp_fp][22000]Accuracy-Highest: 0.94629 Training: 2022-01-19 19:16:36,024-[agedb_30][22000]XNorm: 22.081090 Training: 2022-01-19 19:16:36,024-[agedb_30][22000]Accuracy-Flip: 0.96067+-0.00920 Training: 2022-01-19 19:16:36,025-[agedb_30][22000]Accuracy-Highest: 0.96783 Training: 2022-01-19 19:16:38,991-Speed 73.72 samples/sec Loss 11.3397 LearningRate 0.1364 Epoch: 4 Global Step: 22010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:16:41,929-Speed 3485.79 samples/sec Loss 11.1754 LearningRate 0.1364 Epoch: 4 Global Step: 22020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:16:44,872-Speed 3480.88 samples/sec Loss 11.2426 LearningRate 0.1364 Epoch: 4 Global Step: 22030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:16:47,800-Speed 3499.04 samples/sec Loss 11.3770 LearningRate 0.1364 Epoch: 4 Global Step: 22040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:16:50,724-Speed 3502.09 samples/sec Loss 11.2711 LearningRate 0.1363 Epoch: 4 Global Step: 22050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:16:53,653-Speed 3497.57 samples/sec Loss 11.2359 LearningRate 0.1363 Epoch: 4 Global Step: 22060 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:16:56,590-Speed 3486.94 samples/sec Loss 11.2234 LearningRate 0.1363 Epoch: 4 Global Step: 22070 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:16:59,523-Speed 3492.56 samples/sec Loss 11.1879 LearningRate 0.1363 Epoch: 4 Global Step: 22080 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:17:02,443-Speed 3507.58 samples/sec Loss 11.3986 LearningRate 0.1362 Epoch: 4 Global Step: 22090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:05,386-Speed 3480.92 samples/sec Loss 11.1949 LearningRate 0.1362 Epoch: 4 Global Step: 22100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:08,324-Speed 3486.90 samples/sec Loss 11.2468 LearningRate 0.1362 Epoch: 4 Global Step: 22110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:11,257-Speed 3491.38 samples/sec Loss 11.2852 LearningRate 0.1361 Epoch: 4 Global Step: 22120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:14,196-Speed 3484.88 samples/sec Loss 11.2556 LearningRate 0.1361 Epoch: 4 Global Step: 22130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:17,167-Speed 3448.64 samples/sec Loss 11.2533 LearningRate 0.1361 Epoch: 4 Global Step: 22140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:20,153-Speed 3429.97 samples/sec Loss 11.0686 LearningRate 0.1361 Epoch: 4 Global Step: 22150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:23,119-Speed 3453.57 samples/sec Loss 11.2646 LearningRate 0.1360 Epoch: 4 Global Step: 22160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:26,069-Speed 3471.17 samples/sec Loss 11.1568 LearningRate 0.1360 Epoch: 4 Global Step: 22170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:29,030-Speed 3460.23 samples/sec Loss 11.2834 LearningRate 0.1360 Epoch: 4 Global Step: 22180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:31,974-Speed 3479.19 samples/sec Loss 11.1700 LearningRate 0.1360 Epoch: 4 Global Step: 22190 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:17:34,929-Speed 3465.65 samples/sec Loss 11.3392 LearningRate 0.1359 Epoch: 4 Global Step: 22200 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:17:37,862-Speed 3492.94 samples/sec Loss 11.3497 LearningRate 0.1359 Epoch: 4 Global Step: 22210 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:17:40,784-Speed 3505.29 samples/sec Loss 11.0436 LearningRate 0.1359 Epoch: 4 Global Step: 22220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:43,749-Speed 3454.19 samples/sec Loss 11.3958 LearningRate 0.1359 Epoch: 4 Global Step: 22230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:46,704-Speed 3466.85 samples/sec Loss 11.0929 LearningRate 0.1358 Epoch: 4 Global Step: 22240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:49,645-Speed 3482.27 samples/sec Loss 11.0012 LearningRate 0.1358 Epoch: 4 Global Step: 22250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:52,587-Speed 3482.09 samples/sec Loss 11.0306 LearningRate 0.1358 Epoch: 4 Global Step: 22260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:55,607-Speed 3390.96 samples/sec Loss 11.1576 LearningRate 0.1358 Epoch: 4 Global Step: 22270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:17:58,554-Speed 3476.39 samples/sec Loss 11.1486 LearningRate 0.1357 Epoch: 4 Global Step: 22280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:01,494-Speed 3484.18 samples/sec Loss 11.1237 LearningRate 0.1357 Epoch: 4 Global Step: 22290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:04,436-Speed 3481.27 samples/sec Loss 11.3237 LearningRate 0.1357 Epoch: 4 Global Step: 22300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:07,444-Speed 3405.16 samples/sec Loss 11.1105 LearningRate 0.1357 Epoch: 4 Global Step: 22310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:10,405-Speed 3459.42 samples/sec Loss 11.1945 LearningRate 0.1356 Epoch: 4 Global Step: 22320 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:18:13,330-Speed 3502.15 samples/sec Loss 11.2489 LearningRate 0.1356 Epoch: 4 Global Step: 22330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:16,274-Speed 3478.83 samples/sec Loss 11.1749 LearningRate 0.1356 Epoch: 4 Global Step: 22340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:19,221-Speed 3475.48 samples/sec Loss 11.1823 LearningRate 0.1355 Epoch: 4 Global Step: 22350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:22,166-Speed 3478.23 samples/sec Loss 11.2111 LearningRate 0.1355 Epoch: 4 Global Step: 22360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:25,105-Speed 3484.52 samples/sec Loss 11.1249 LearningRate 0.1355 Epoch: 4 Global Step: 22370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:28,063-Speed 3463.75 samples/sec Loss 11.1156 LearningRate 0.1355 Epoch: 4 Global Step: 22380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:31,038-Speed 3442.98 samples/sec Loss 11.3303 LearningRate 0.1354 Epoch: 4 Global Step: 22390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:33,969-Speed 3494.35 samples/sec Loss 11.0378 LearningRate 0.1354 Epoch: 4 Global Step: 22400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:36,944-Speed 3442.56 samples/sec Loss 11.1662 LearningRate 0.1354 Epoch: 4 Global Step: 22410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:39,928-Speed 3433.42 samples/sec Loss 11.1400 LearningRate 0.1354 Epoch: 4 Global Step: 22420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:42,866-Speed 3485.70 samples/sec Loss 11.1665 LearningRate 0.1353 Epoch: 4 Global Step: 22430 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:18:45,791-Speed 3502.14 samples/sec Loss 11.2859 LearningRate 0.1353 Epoch: 4 Global Step: 22440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:48,725-Speed 3491.39 samples/sec Loss 10.9971 LearningRate 0.1353 Epoch: 4 Global Step: 22450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:51,654-Speed 3496.86 samples/sec Loss 11.3687 LearningRate 0.1353 Epoch: 4 Global Step: 22460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:54,585-Speed 3494.46 samples/sec Loss 11.3202 LearningRate 0.1352 Epoch: 4 Global Step: 22470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:18:57,515-Speed 3495.73 samples/sec Loss 11.2625 LearningRate 0.1352 Epoch: 4 Global Step: 22480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:19:00,450-Speed 3489.59 samples/sec Loss 11.3206 LearningRate 0.1352 Epoch: 4 Global Step: 22490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:19:03,952-Speed 2925.25 samples/sec Loss 11.1536 LearningRate 0.1352 Epoch: 4 Global Step: 22500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:19:07,931-Speed 2574.05 samples/sec Loss 11.2305 LearningRate 0.1351 Epoch: 4 Global Step: 22510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:19:11,154-Speed 3177.19 samples/sec Loss 11.1640 LearningRate 0.1351 Epoch: 4 Global Step: 22520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:19:14,138-Speed 3432.58 samples/sec Loss 11.1911 LearningRate 0.1351 Epoch: 4 Global Step: 22530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-19 19:19:17,073-Speed 3489.84 samples/sec Loss 11.0748 LearningRate 0.1351 Epoch: 4 Global Step: 22540 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:19:20,006-Speed 3492.67 samples/sec Loss 11.0902 LearningRate 0.1350 Epoch: 4 Global Step: 22550 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-19 19:19:22,977-Speed 3447.35 samples/sec Loss 11.2326 LearningRate 0.1350 Epoch: 4 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:25,951-Speed 3444.04 samples/sec Loss 11.2034 LearningRate 0.1350 Epoch: 4 Global Step: 22570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:28,960-Speed 3405.36 samples/sec Loss 11.1458 LearningRate 0.1349 Epoch: 4 Global Step: 22580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:31,912-Speed 3468.97 samples/sec Loss 11.0297 LearningRate 0.1349 Epoch: 4 Global Step: 22590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:34,871-Speed 3461.57 samples/sec Loss 11.0702 LearningRate 0.1349 Epoch: 4 Global Step: 22600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:37,802-Speed 3494.75 samples/sec Loss 11.1848 LearningRate 0.1349 Epoch: 4 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:40,868-Speed 3340.58 samples/sec Loss 11.0299 LearningRate 0.1348 Epoch: 4 Global Step: 22620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:43,801-Speed 3491.93 samples/sec Loss 11.2369 LearningRate 0.1348 Epoch: 4 Global Step: 22630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:46,750-Speed 3473.13 samples/sec Loss 11.1837 LearningRate 0.1348 Epoch: 4 Global Step: 22640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:49,679-Speed 3496.97 samples/sec Loss 11.2286 LearningRate 0.1348 Epoch: 4 Global Step: 22650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:52,638-Speed 3461.56 samples/sec Loss 11.2060 LearningRate 0.1347 Epoch: 4 Global Step: 22660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:55,621-Speed 3433.95 samples/sec Loss 11.1802 LearningRate 0.1347 Epoch: 4 Global Step: 22670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:19:58,558-Speed 3488.39 samples/sec Loss 11.1567 LearningRate 0.1347 Epoch: 4 Global Step: 22680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:01,502-Speed 3478.21 samples/sec Loss 11.0817 LearningRate 0.1347 Epoch: 4 Global Step: 22690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:04,459-Speed 3463.83 samples/sec Loss 11.0636 LearningRate 0.1346 Epoch: 4 Global Step: 22700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:07,435-Speed 3441.88 samples/sec Loss 11.1390 LearningRate 0.1346 Epoch: 4 Global Step: 22710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:10,371-Speed 3488.60 samples/sec Loss 11.2443 LearningRate 0.1346 Epoch: 4 Global Step: 22720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:13,335-Speed 3456.49 samples/sec Loss 11.2998 LearningRate 0.1346 Epoch: 4 Global Step: 22730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:16,263-Speed 3497.22 samples/sec Loss 11.2778 LearningRate 0.1345 Epoch: 4 Global Step: 22740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:19,206-Speed 3481.92 samples/sec Loss 11.1685 LearningRate 0.1345 Epoch: 4 Global Step: 22750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:22,128-Speed 3505.16 samples/sec Loss 11.1925 LearningRate 0.1345 Epoch: 4 Global Step: 22760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:25,055-Speed 3498.36 samples/sec Loss 11.0654 LearningRate 0.1345 Epoch: 4 Global Step: 22770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:28,016-Speed 3460.06 samples/sec Loss 11.1992 LearningRate 0.1344 Epoch: 4 Global Step: 22780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:30,950-Speed 3490.65 samples/sec Loss 11.1755 LearningRate 0.1344 Epoch: 4 Global Step: 22790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:33,890-Speed 3484.06 samples/sec Loss 11.1946 LearningRate 0.1344 Epoch: 4 Global Step: 22800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:36,857-Speed 3452.15 samples/sec Loss 11.0574 LearningRate 0.1344 Epoch: 4 Global Step: 22810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:39,788-Speed 3494.58 samples/sec Loss 11.0773 LearningRate 0.1343 Epoch: 4 Global Step: 22820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:42,754-Speed 3453.01 samples/sec Loss 11.0589 LearningRate 0.1343 Epoch: 4 Global Step: 22830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:45,732-Speed 3439.24 samples/sec Loss 11.0292 LearningRate 0.1343 Epoch: 4 Global Step: 22840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:48,756-Speed 3389.25 samples/sec Loss 11.0612 LearningRate 0.1342 Epoch: 4 Global Step: 22850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:51,711-Speed 3465.45 samples/sec Loss 11.0498 LearningRate 0.1342 Epoch: 4 Global Step: 22860 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:20:54,633-Speed 3505.23 samples/sec Loss 11.2108 LearningRate 0.1342 Epoch: 4 Global Step: 22870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:20:57,589-Speed 3465.95 samples/sec Loss 11.3754 LearningRate 0.1342 Epoch: 4 Global Step: 22880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:21:00,552-Speed 3456.27 samples/sec Loss 11.0051 LearningRate 0.1341 Epoch: 4 Global Step: 22890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:21:03,501-Speed 3473.52 samples/sec Loss 11.1517 LearningRate 0.1341 Epoch: 4 Global Step: 22900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:21:06,480-Speed 3438.14 samples/sec Loss 10.9966 LearningRate 0.1341 Epoch: 4 Global Step: 22910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:21:09,407-Speed 3498.59 samples/sec Loss 11.0658 LearningRate 0.1341 Epoch: 4 Global Step: 22920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:21:12,349-Speed 3481.65 samples/sec Loss 11.0725 LearningRate 0.1340 Epoch: 4 Global Step: 22930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:21:15,324-Speed 3442.82 samples/sec Loss 11.0846 LearningRate 0.1340 Epoch: 4 Global Step: 22940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:21:18,334-Speed 3404.07 samples/sec Loss 11.1583 LearningRate 0.1340 Epoch: 4 Global Step: 22950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:21:21,273-Speed 3484.85 samples/sec Loss 11.2986 LearningRate 0.1340 Epoch: 4 Global Step: 22960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:21:24,224-Speed 3471.05 samples/sec Loss 11.1224 LearningRate 0.1339 Epoch: 4 Global Step: 22970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:21:27,181-Speed 3463.59 samples/sec Loss 11.1029 LearningRate 0.1339 Epoch: 4 Global Step: 22980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:21:30,119-Speed 3486.36 samples/sec Loss 11.0025 LearningRate 0.1339 Epoch: 4 Global Step: 22990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:21:33,051-Speed 3493.40 samples/sec Loss 11.1984 LearningRate 0.1339 Epoch: 4 Global Step: 23000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:21:36,004-Speed 3468.43 samples/sec Loss 11.0158 LearningRate 0.1338 Epoch: 4 Global Step: 23010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:21:38,960-Speed 3464.42 samples/sec Loss 11.0547 LearningRate 0.1338 Epoch: 4 Global Step: 23020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:21:41,925-Speed 3454.98 samples/sec Loss 11.0448 LearningRate 0.1338 Epoch: 4 Global Step: 23030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:21:44,865-Speed 3484.48 samples/sec Loss 11.1385 LearningRate 0.1338 Epoch: 4 Global Step: 23040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:21:47,793-Speed 3497.76 samples/sec Loss 11.1448 LearningRate 0.1337 Epoch: 4 Global Step: 23050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:21:50,745-Speed 3469.93 samples/sec Loss 11.1135 LearningRate 0.1337 Epoch: 4 Global Step: 23060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:21:53,705-Speed 3460.52 samples/sec Loss 10.9198 LearningRate 0.1337 Epoch: 4 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:21:56,695-Speed 3425.41 samples/sec Loss 11.0835 LearningRate 0.1337 Epoch: 4 Global Step: 23080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:21:59,641-Speed 3475.94 samples/sec Loss 10.9919 LearningRate 0.1336 Epoch: 4 Global Step: 23090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:22:02,589-Speed 3475.10 samples/sec Loss 10.9920 LearningRate 0.1336 Epoch: 4 Global Step: 23100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:22:05,546-Speed 3464.31 samples/sec Loss 11.0834 LearningRate 0.1336 Epoch: 4 Global Step: 23110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:22:08,505-Speed 3461.41 samples/sec Loss 10.9460 LearningRate 0.1336 Epoch: 4 Global Step: 23120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:22:11,443-Speed 3486.71 samples/sec Loss 11.0549 LearningRate 0.1335 Epoch: 4 Global Step: 23130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:22:14,406-Speed 3457.29 samples/sec Loss 11.0560 LearningRate 0.1335 Epoch: 4 Global Step: 23140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:22:17,340-Speed 3491.30 samples/sec Loss 10.9276 LearningRate 0.1335 Epoch: 4 Global Step: 23150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:22:20,284-Speed 3478.10 samples/sec Loss 11.0066 LearningRate 0.1334 Epoch: 4 Global Step: 23160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:22:23,218-Speed 3492.00 samples/sec Loss 11.1237 LearningRate 0.1334 Epoch: 4 Global Step: 23170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:22:26,147-Speed 3496.69 samples/sec Loss 11.1674 LearningRate 0.1334 Epoch: 4 Global Step: 23180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:22:29,127-Speed 3437.25 samples/sec Loss 10.9905 LearningRate 0.1334 Epoch: 4 Global Step: 23190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:22:32,081-Speed 3466.90 samples/sec Loss 11.1625 LearningRate 0.1333 Epoch: 4 Global Step: 23200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:22:35,080-Speed 3415.36 samples/sec Loss 11.0774 LearningRate 0.1333 Epoch: 4 Global Step: 23210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:22:38,064-Speed 3433.45 samples/sec Loss 11.0477 LearningRate 0.1333 Epoch: 4 Global Step: 23220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:22:41,047-Speed 3433.12 samples/sec Loss 10.9349 LearningRate 0.1333 Epoch: 4 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:22:44,009-Speed 3459.13 samples/sec Loss 10.9247 LearningRate 0.1332 Epoch: 4 Global Step: 23240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:22:46,973-Speed 3455.04 samples/sec Loss 10.8924 LearningRate 0.1332 Epoch: 4 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:22:49,931-Speed 3462.20 samples/sec Loss 10.9140 LearningRate 0.1332 Epoch: 4 Global Step: 23260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:22:52,876-Speed 3478.96 samples/sec Loss 11.1692 LearningRate 0.1332 Epoch: 4 Global Step: 23270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:22:55,830-Speed 3467.53 samples/sec Loss 10.8882 LearningRate 0.1331 Epoch: 4 Global Step: 23280 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:22:58,772-Speed 3481.62 samples/sec Loss 11.0376 LearningRate 0.1331 Epoch: 4 Global Step: 23290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:23:01,756-Speed 3432.40 samples/sec Loss 11.0935 LearningRate 0.1331 Epoch: 4 Global Step: 23300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:23:04,691-Speed 3490.75 samples/sec Loss 11.1027 LearningRate 0.1331 Epoch: 4 Global Step: 23310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:23:07,615-Speed 3502.49 samples/sec Loss 11.1042 LearningRate 0.1330 Epoch: 4 Global Step: 23320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:23:10,565-Speed 3472.35 samples/sec Loss 10.9996 LearningRate 0.1330 Epoch: 4 Global Step: 23330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:23:13,536-Speed 3447.23 samples/sec Loss 11.0307 LearningRate 0.1330 Epoch: 4 Global Step: 23340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:23:16,566-Speed 3381.05 samples/sec Loss 11.0050 LearningRate 0.1330 Epoch: 4 Global Step: 23350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:23:19,518-Speed 3469.22 samples/sec Loss 11.1610 LearningRate 0.1329 Epoch: 4 Global Step: 23360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:23:22,462-Speed 3479.49 samples/sec Loss 11.2315 LearningRate 0.1329 Epoch: 4 Global Step: 23370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:23:25,401-Speed 3484.82 samples/sec Loss 10.9843 LearningRate 0.1329 Epoch: 4 Global Step: 23380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:23:28,374-Speed 3445.53 samples/sec Loss 11.1123 LearningRate 0.1329 Epoch: 4 Global Step: 23390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:23:31,321-Speed 3474.85 samples/sec Loss 10.9477 LearningRate 0.1328 Epoch: 4 Global Step: 23400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:23:34,253-Speed 3494.53 samples/sec Loss 10.9410 LearningRate 0.1328 Epoch: 4 Global Step: 23410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:23:37,251-Speed 3416.06 samples/sec Loss 11.0780 LearningRate 0.1328 Epoch: 4 Global Step: 23420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:23:40,269-Speed 3394.26 samples/sec Loss 10.8497 LearningRate 0.1328 Epoch: 4 Global Step: 23430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:23:43,206-Speed 3487.44 samples/sec Loss 11.0843 LearningRate 0.1327 Epoch: 4 Global Step: 23440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:23:46,161-Speed 3465.61 samples/sec Loss 11.0728 LearningRate 0.1327 Epoch: 4 Global Step: 23450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:23:49,146-Speed 3432.56 samples/sec Loss 11.0788 LearningRate 0.1327 Epoch: 4 Global Step: 23460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:23:52,079-Speed 3491.91 samples/sec Loss 11.1941 LearningRate 0.1326 Epoch: 4 Global Step: 23470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:23:55,009-Speed 3495.20 samples/sec Loss 10.9501 LearningRate 0.1326 Epoch: 4 Global Step: 23480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:23:57,960-Speed 3471.29 samples/sec Loss 11.1973 LearningRate 0.1326 Epoch: 4 Global Step: 23490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:24:00,896-Speed 3488.29 samples/sec Loss 10.9222 LearningRate 0.1326 Epoch: 4 Global Step: 23500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:24:03,855-Speed 3462.37 samples/sec Loss 11.0178 LearningRate 0.1325 Epoch: 4 Global Step: 23510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:24:06,788-Speed 3491.47 samples/sec Loss 11.0160 LearningRate 0.1325 Epoch: 4 Global Step: 23520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:24:09,713-Speed 3502.15 samples/sec Loss 11.0444 LearningRate 0.1325 Epoch: 4 Global Step: 23530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:24:12,708-Speed 3419.31 samples/sec Loss 11.1428 LearningRate 0.1325 Epoch: 4 Global Step: 23540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:24:15,641-Speed 3491.94 samples/sec Loss 11.2365 LearningRate 0.1324 Epoch: 4 Global Step: 23550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:24:18,592-Speed 3471.85 samples/sec Loss 11.0892 LearningRate 0.1324 Epoch: 4 Global Step: 23560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:24:21,540-Speed 3474.03 samples/sec Loss 10.8382 LearningRate 0.1324 Epoch: 4 Global Step: 23570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:24:24,460-Speed 3507.32 samples/sec Loss 10.9812 LearningRate 0.1324 Epoch: 4 Global Step: 23580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:24:27,391-Speed 3495.21 samples/sec Loss 11.0432 LearningRate 0.1323 Epoch: 4 Global Step: 23590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:24:30,339-Speed 3475.73 samples/sec Loss 11.0120 LearningRate 0.1323 Epoch: 4 Global Step: 23600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:24:33,270-Speed 3494.40 samples/sec Loss 11.1665 LearningRate 0.1323 Epoch: 4 Global Step: 23610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:24:36,207-Speed 3487.92 samples/sec Loss 11.1114 LearningRate 0.1323 Epoch: 4 Global Step: 23620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:24:39,168-Speed 3458.82 samples/sec Loss 11.1724 LearningRate 0.1322 Epoch: 4 Global Step: 23630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:24:42,139-Speed 3447.35 samples/sec Loss 11.0184 LearningRate 0.1322 Epoch: 4 Global Step: 23640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:24:45,076-Speed 3487.00 samples/sec Loss 10.8594 LearningRate 0.1322 Epoch: 4 Global Step: 23650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:24:48,015-Speed 3485.53 samples/sec Loss 10.9013 LearningRate 0.1322 Epoch: 4 Global Step: 23660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:24:51,013-Speed 3417.01 samples/sec Loss 10.8499 LearningRate 0.1321 Epoch: 4 Global Step: 23670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:24:53,983-Speed 3448.30 samples/sec Loss 10.8713 LearningRate 0.1321 Epoch: 4 Global Step: 23680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:24:56,925-Speed 3482.46 samples/sec Loss 11.1170 LearningRate 0.1321 Epoch: 4 Global Step: 23690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:24:59,860-Speed 3489.22 samples/sec Loss 11.0660 LearningRate 0.1321 Epoch: 4 Global Step: 23700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:02,795-Speed 3490.77 samples/sec Loss 11.0694 LearningRate 0.1320 Epoch: 4 Global Step: 23710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:05,728-Speed 3491.05 samples/sec Loss 10.9109 LearningRate 0.1320 Epoch: 4 Global Step: 23720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:08,668-Speed 3484.86 samples/sec Loss 11.0911 LearningRate 0.1320 Epoch: 4 Global Step: 23730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:11,620-Speed 3469.49 samples/sec Loss 10.8814 LearningRate 0.1320 Epoch: 4 Global Step: 23740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:14,588-Speed 3450.92 samples/sec Loss 10.8726 LearningRate 0.1319 Epoch: 4 Global Step: 23750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:17,527-Speed 3485.61 samples/sec Loss 10.8820 LearningRate 0.1319 Epoch: 4 Global Step: 23760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:20,521-Speed 3420.61 samples/sec Loss 11.0943 LearningRate 0.1319 Epoch: 4 Global Step: 23770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:23,461-Speed 3483.85 samples/sec Loss 10.9294 LearningRate 0.1318 Epoch: 4 Global Step: 23780 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:25:26,394-Speed 3493.06 samples/sec Loss 10.8727 LearningRate 0.1318 Epoch: 4 Global Step: 23790 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:25:29,321-Speed 3498.72 samples/sec Loss 10.7599 LearningRate 0.1318 Epoch: 4 Global Step: 23800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:32,254-Speed 3492.45 samples/sec Loss 11.0268 LearningRate 0.1318 Epoch: 4 Global Step: 23810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:35,196-Speed 3481.15 samples/sec Loss 11.0615 LearningRate 0.1317 Epoch: 4 Global Step: 23820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:38,206-Speed 3403.00 samples/sec Loss 11.2224 LearningRate 0.1317 Epoch: 4 Global Step: 23830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:41,141-Speed 3489.87 samples/sec Loss 11.1067 LearningRate 0.1317 Epoch: 4 Global Step: 23840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:44,078-Speed 3488.27 samples/sec Loss 10.8160 LearningRate 0.1317 Epoch: 4 Global Step: 23850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:47,091-Speed 3399.28 samples/sec Loss 11.1220 LearningRate 0.1316 Epoch: 4 Global Step: 23860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:50,029-Speed 3485.50 samples/sec Loss 10.9178 LearningRate 0.1316 Epoch: 4 Global Step: 23870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:52,969-Speed 3484.77 samples/sec Loss 10.8706 LearningRate 0.1316 Epoch: 4 Global Step: 23880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:55,904-Speed 3489.85 samples/sec Loss 11.0169 LearningRate 0.1316 Epoch: 4 Global Step: 23890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:25:58,859-Speed 3466.21 samples/sec Loss 10.9921 LearningRate 0.1315 Epoch: 4 Global Step: 23900 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:26:01,785-Speed 3499.87 samples/sec Loss 10.9882 LearningRate 0.1315 Epoch: 4 Global Step: 23910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:26:04,722-Speed 3487.74 samples/sec Loss 10.9575 LearningRate 0.1315 Epoch: 4 Global Step: 23920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:26:07,667-Speed 3478.54 samples/sec Loss 11.0825 LearningRate 0.1315 Epoch: 4 Global Step: 23930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:26:10,604-Speed 3487.68 samples/sec Loss 10.9195 LearningRate 0.1314 Epoch: 4 Global Step: 23940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:26:13,549-Speed 3478.47 samples/sec Loss 10.7925 LearningRate 0.1314 Epoch: 4 Global Step: 23950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:26:16,496-Speed 3474.57 samples/sec Loss 10.9603 LearningRate 0.1314 Epoch: 4 Global Step: 23960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:26:19,476-Speed 3439.11 samples/sec Loss 10.9858 LearningRate 0.1314 Epoch: 4 Global Step: 23970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:26:22,440-Speed 3454.89 samples/sec Loss 11.0128 LearningRate 0.1313 Epoch: 4 Global Step: 23980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:26:25,381-Speed 3482.89 samples/sec Loss 10.9227 LearningRate 0.1313 Epoch: 4 Global Step: 23990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:26:28,331-Speed 3472.19 samples/sec Loss 10.8307 LearningRate 0.1313 Epoch: 4 Global Step: 24000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:27:10,991-[lfw][24000]XNorm: 22.466048 Training: 2022-01-19 19:27:10,992-[lfw][24000]Accuracy-Flip: 0.99633+-0.00267 Training: 2022-01-19 19:27:10,992-[lfw][24000]Accuracy-Highest: 0.99700 Training: 2022-01-19 19:28:00,581-[cfp_fp][24000]XNorm: 19.435566 Training: 2022-01-19 19:28:00,582-[cfp_fp][24000]Accuracy-Flip: 0.93757+-0.01046 Training: 2022-01-19 19:28:00,583-[cfp_fp][24000]Accuracy-Highest: 0.94629 Training: 2022-01-19 19:28:43,271-[agedb_30][24000]XNorm: 22.132325 Training: 2022-01-19 19:28:43,271-[agedb_30][24000]Accuracy-Flip: 0.96833+-0.00707 Training: 2022-01-19 19:28:43,272-[agedb_30][24000]Accuracy-Highest: 0.96833 Training: 2022-01-19 19:28:46,208-Speed 74.27 samples/sec Loss 11.0039 LearningRate 0.1313 Epoch: 4 Global Step: 24010 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:28:49,207-Speed 3415.51 samples/sec Loss 11.0222 LearningRate 0.1312 Epoch: 4 Global Step: 24020 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:28:52,124-Speed 3511.18 samples/sec Loss 10.8314 LearningRate 0.1312 Epoch: 4 Global Step: 24030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:28:55,053-Speed 3497.46 samples/sec Loss 10.8930 LearningRate 0.1312 Epoch: 4 Global Step: 24040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:28:57,993-Speed 3483.69 samples/sec Loss 11.0338 LearningRate 0.1312 Epoch: 4 Global Step: 24050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:00,983-Speed 3426.01 samples/sec Loss 10.7495 LearningRate 0.1311 Epoch: 4 Global Step: 24060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:03,921-Speed 3486.09 samples/sec Loss 10.9047 LearningRate 0.1311 Epoch: 4 Global Step: 24070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:06,932-Speed 3401.20 samples/sec Loss 11.0697 LearningRate 0.1311 Epoch: 4 Global Step: 24080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:09,868-Speed 3489.05 samples/sec Loss 10.8241 LearningRate 0.1311 Epoch: 4 Global Step: 24090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:12,814-Speed 3476.32 samples/sec Loss 10.8754 LearningRate 0.1310 Epoch: 4 Global Step: 24100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:15,786-Speed 3446.51 samples/sec Loss 10.9487 LearningRate 0.1310 Epoch: 4 Global Step: 24110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:18,736-Speed 3471.94 samples/sec Loss 11.0067 LearningRate 0.1310 Epoch: 4 Global Step: 24120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:21,698-Speed 3458.04 samples/sec Loss 10.8700 LearningRate 0.1310 Epoch: 4 Global Step: 24130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:24,647-Speed 3473.74 samples/sec Loss 10.7659 LearningRate 0.1309 Epoch: 4 Global Step: 24140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:27,579-Speed 3494.01 samples/sec Loss 10.8130 LearningRate 0.1309 Epoch: 4 Global Step: 24150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:30,511-Speed 3493.74 samples/sec Loss 11.0006 LearningRate 0.1309 Epoch: 4 Global Step: 24160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:33,442-Speed 3493.57 samples/sec Loss 11.0465 LearningRate 0.1309 Epoch: 4 Global Step: 24170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:36,382-Speed 3483.89 samples/sec Loss 11.0011 LearningRate 0.1308 Epoch: 4 Global Step: 24180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:39,315-Speed 3492.61 samples/sec Loss 10.9089 LearningRate 0.1308 Epoch: 4 Global Step: 24190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:42,252-Speed 3487.88 samples/sec Loss 10.7279 LearningRate 0.1308 Epoch: 4 Global Step: 24200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:45,190-Speed 3486.31 samples/sec Loss 11.0276 LearningRate 0.1307 Epoch: 4 Global Step: 24210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:48,128-Speed 3486.93 samples/sec Loss 11.1696 LearningRate 0.1307 Epoch: 4 Global Step: 24220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:51,070-Speed 3481.32 samples/sec Loss 10.8878 LearningRate 0.1307 Epoch: 4 Global Step: 24230 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:29:53,998-Speed 3498.87 samples/sec Loss 10.8390 LearningRate 0.1307 Epoch: 4 Global Step: 24240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:56,928-Speed 3495.95 samples/sec Loss 10.9995 LearningRate 0.1306 Epoch: 4 Global Step: 24250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:29:59,866-Speed 3485.97 samples/sec Loss 11.0226 LearningRate 0.1306 Epoch: 4 Global Step: 24260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:02,814-Speed 3474.48 samples/sec Loss 10.9227 LearningRate 0.1306 Epoch: 4 Global Step: 24270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:05,751-Speed 3487.69 samples/sec Loss 10.8132 LearningRate 0.1306 Epoch: 4 Global Step: 24280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:08,697-Speed 3476.56 samples/sec Loss 11.0588 LearningRate 0.1305 Epoch: 4 Global Step: 24290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:11,641-Speed 3478.94 samples/sec Loss 10.9590 LearningRate 0.1305 Epoch: 4 Global Step: 24300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:14,587-Speed 3477.72 samples/sec Loss 11.0334 LearningRate 0.1305 Epoch: 4 Global Step: 24310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:17,532-Speed 3477.13 samples/sec Loss 10.8261 LearningRate 0.1305 Epoch: 4 Global Step: 24320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:20,469-Speed 3487.86 samples/sec Loss 11.2068 LearningRate 0.1304 Epoch: 4 Global Step: 24330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:23,397-Speed 3498.23 samples/sec Loss 10.9694 LearningRate 0.1304 Epoch: 4 Global Step: 24340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:26,334-Speed 3487.67 samples/sec Loss 10.9688 LearningRate 0.1304 Epoch: 4 Global Step: 24350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:29,267-Speed 3492.66 samples/sec Loss 11.0275 LearningRate 0.1304 Epoch: 4 Global Step: 24360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:32,281-Speed 3397.88 samples/sec Loss 10.8824 LearningRate 0.1303 Epoch: 4 Global Step: 24370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:35,330-Speed 3359.63 samples/sec Loss 10.9480 LearningRate 0.1303 Epoch: 4 Global Step: 24380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:38,290-Speed 3460.09 samples/sec Loss 10.9232 LearningRate 0.1303 Epoch: 4 Global Step: 24390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:41,276-Speed 3430.60 samples/sec Loss 10.9325 LearningRate 0.1303 Epoch: 4 Global Step: 24400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:44,211-Speed 3490.21 samples/sec Loss 10.9619 LearningRate 0.1302 Epoch: 4 Global Step: 24410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:47,139-Speed 3498.10 samples/sec Loss 10.8692 LearningRate 0.1302 Epoch: 4 Global Step: 24420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:50,073-Speed 3490.89 samples/sec Loss 10.9195 LearningRate 0.1302 Epoch: 4 Global Step: 24430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:53,002-Speed 3496.61 samples/sec Loss 11.1863 LearningRate 0.1302 Epoch: 4 Global Step: 24440 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:30:55,970-Speed 3451.31 samples/sec Loss 11.0134 LearningRate 0.1301 Epoch: 4 Global Step: 24450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:30:58,908-Speed 3485.73 samples/sec Loss 10.8161 LearningRate 0.1301 Epoch: 4 Global Step: 24460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:31:01,846-Speed 3486.91 samples/sec Loss 11.0246 LearningRate 0.1301 Epoch: 4 Global Step: 24470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:31:04,787-Speed 3482.23 samples/sec Loss 11.0193 LearningRate 0.1301 Epoch: 4 Global Step: 24480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:31:07,750-Speed 3457.60 samples/sec Loss 10.8255 LearningRate 0.1300 Epoch: 4 Global Step: 24490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:31:10,822-Speed 3334.54 samples/sec Loss 10.9964 LearningRate 0.1300 Epoch: 4 Global Step: 24500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:31:13,810-Speed 3427.76 samples/sec Loss 10.8809 LearningRate 0.1300 Epoch: 4 Global Step: 24510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:31:16,751-Speed 3482.62 samples/sec Loss 10.8331 LearningRate 0.1300 Epoch: 4 Global Step: 24520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:31:19,683-Speed 3494.12 samples/sec Loss 10.7952 LearningRate 0.1299 Epoch: 4 Global Step: 24530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:31:22,612-Speed 3496.36 samples/sec Loss 10.8948 LearningRate 0.1299 Epoch: 4 Global Step: 24540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:31:25,509-Speed 3536.67 samples/sec Loss 10.9157 LearningRate 0.1299 Epoch: 4 Global Step: 24550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:31:28,453-Speed 3478.97 samples/sec Loss 10.8086 LearningRate 0.1299 Epoch: 4 Global Step: 24560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:31:31,421-Speed 3450.88 samples/sec Loss 10.8876 LearningRate 0.1298 Epoch: 4 Global Step: 24570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:31:34,354-Speed 3492.54 samples/sec Loss 10.8939 LearningRate 0.1298 Epoch: 4 Global Step: 24580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:31:37,323-Speed 3449.79 samples/sec Loss 11.0613 LearningRate 0.1298 Epoch: 4 Global Step: 24590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:31:40,252-Speed 3496.98 samples/sec Loss 10.8003 LearningRate 0.1298 Epoch: 4 Global Step: 24600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:31:43,189-Speed 3487.18 samples/sec Loss 10.9794 LearningRate 0.1297 Epoch: 4 Global Step: 24610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:31:46,181-Speed 3423.69 samples/sec Loss 10.8493 LearningRate 0.1297 Epoch: 4 Global Step: 24620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:31:49,160-Speed 3438.95 samples/sec Loss 10.9136 LearningRate 0.1297 Epoch: 4 Global Step: 24630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:31:52,092-Speed 3492.56 samples/sec Loss 11.0735 LearningRate 0.1297 Epoch: 4 Global Step: 24640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:31:55,023-Speed 3495.34 samples/sec Loss 10.8844 LearningRate 0.1296 Epoch: 4 Global Step: 24650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:31:58,021-Speed 3416.50 samples/sec Loss 10.9705 LearningRate 0.1296 Epoch: 4 Global Step: 24660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:32:00,949-Speed 3498.99 samples/sec Loss 10.9124 LearningRate 0.1296 Epoch: 4 Global Step: 24670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:32:03,882-Speed 3492.42 samples/sec Loss 10.8200 LearningRate 0.1295 Epoch: 4 Global Step: 24680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:32:06,808-Speed 3500.47 samples/sec Loss 11.0507 LearningRate 0.1295 Epoch: 4 Global Step: 24690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:32:09,736-Speed 3498.54 samples/sec Loss 10.8921 LearningRate 0.1295 Epoch: 4 Global Step: 24700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:32:12,665-Speed 3497.43 samples/sec Loss 10.9150 LearningRate 0.1295 Epoch: 4 Global Step: 24710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:32:15,605-Speed 3483.77 samples/sec Loss 10.6655 LearningRate 0.1294 Epoch: 4 Global Step: 24720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:32:18,537-Speed 3492.84 samples/sec Loss 10.8744 LearningRate 0.1294 Epoch: 4 Global Step: 24730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:32:21,491-Speed 3468.43 samples/sec Loss 10.8704 LearningRate 0.1294 Epoch: 4 Global Step: 24740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:32:24,419-Speed 3497.35 samples/sec Loss 10.8433 LearningRate 0.1294 Epoch: 4 Global Step: 24750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:32:27,359-Speed 3484.36 samples/sec Loss 10.9150 LearningRate 0.1293 Epoch: 4 Global Step: 24760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:32:30,343-Speed 3432.96 samples/sec Loss 10.8064 LearningRate 0.1293 Epoch: 4 Global Step: 24770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:32:33,314-Speed 3447.17 samples/sec Loss 10.9263 LearningRate 0.1293 Epoch: 4 Global Step: 24780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:32:36,245-Speed 3494.74 samples/sec Loss 10.9878 LearningRate 0.1293 Epoch: 4 Global Step: 24790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:32:39,177-Speed 3492.97 samples/sec Loss 10.7996 LearningRate 0.1292 Epoch: 4 Global Step: 24800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:32:42,107-Speed 3496.34 samples/sec Loss 10.7606 LearningRate 0.1292 Epoch: 4 Global Step: 24810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:32:45,066-Speed 3461.10 samples/sec Loss 10.7983 LearningRate 0.1292 Epoch: 4 Global Step: 24820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:32:48,032-Speed 3453.25 samples/sec Loss 10.8403 LearningRate 0.1292 Epoch: 4 Global Step: 24830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:32:50,984-Speed 3470.74 samples/sec Loss 10.9837 LearningRate 0.1291 Epoch: 4 Global Step: 24840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:32:53,926-Speed 3480.59 samples/sec Loss 11.0059 LearningRate 0.1291 Epoch: 4 Global Step: 24850 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:32:56,876-Speed 3472.40 samples/sec Loss 10.8290 LearningRate 0.1291 Epoch: 4 Global Step: 24860 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:32:59,802-Speed 3500.29 samples/sec Loss 10.9685 LearningRate 0.1291 Epoch: 4 Global Step: 24870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:33:02,773-Speed 3447.77 samples/sec Loss 10.8259 LearningRate 0.1290 Epoch: 4 Global Step: 24880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:33:05,710-Speed 3488.75 samples/sec Loss 10.9341 LearningRate 0.1290 Epoch: 4 Global Step: 24890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:33:08,689-Speed 3437.26 samples/sec Loss 10.9852 LearningRate 0.1290 Epoch: 4 Global Step: 24900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:33:11,657-Speed 3452.06 samples/sec Loss 10.8171 LearningRate 0.1290 Epoch: 4 Global Step: 24910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:33:14,653-Speed 3418.09 samples/sec Loss 10.7983 LearningRate 0.1289 Epoch: 4 Global Step: 24920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:33:17,626-Speed 3445.20 samples/sec Loss 10.9913 LearningRate 0.1289 Epoch: 4 Global Step: 24930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:33:20,689-Speed 3344.73 samples/sec Loss 10.7737 LearningRate 0.1289 Epoch: 4 Global Step: 24940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:33:23,763-Speed 3330.99 samples/sec Loss 10.6690 LearningRate 0.1289 Epoch: 4 Global Step: 24950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:33:26,721-Speed 3463.16 samples/sec Loss 10.8788 LearningRate 0.1288 Epoch: 4 Global Step: 24960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:33:29,641-Speed 3508.86 samples/sec Loss 10.7634 LearningRate 0.1288 Epoch: 4 Global Step: 24970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:33:32,559-Speed 3509.70 samples/sec Loss 10.7235 LearningRate 0.1288 Epoch: 4 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:33:35,486-Speed 3498.99 samples/sec Loss 10.7982 LearningRate 0.1288 Epoch: 4 Global Step: 24990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:33:38,461-Speed 3443.48 samples/sec Loss 10.8772 LearningRate 0.1287 Epoch: 4 Global Step: 25000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:33:41,414-Speed 3468.43 samples/sec Loss 11.0697 LearningRate 0.1287 Epoch: 4 Global Step: 25010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:33:44,368-Speed 3467.37 samples/sec Loss 10.8028 LearningRate 0.1287 Epoch: 4 Global Step: 25020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:33:47,299-Speed 3495.39 samples/sec Loss 10.5797 LearningRate 0.1287 Epoch: 4 Global Step: 25030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:33:50,227-Speed 3498.71 samples/sec Loss 10.7076 LearningRate 0.1286 Epoch: 4 Global Step: 25040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:33:53,167-Speed 3483.74 samples/sec Loss 10.8177 LearningRate 0.1286 Epoch: 4 Global Step: 25050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:33:56,133-Speed 3454.01 samples/sec Loss 10.7834 LearningRate 0.1286 Epoch: 4 Global Step: 25060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:33:59,063-Speed 3497.71 samples/sec Loss 10.8872 LearningRate 0.1286 Epoch: 4 Global Step: 25070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:34:02,004-Speed 3482.42 samples/sec Loss 11.0559 LearningRate 0.1285 Epoch: 4 Global Step: 25080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:04,979-Speed 3443.52 samples/sec Loss 10.7716 LearningRate 0.1285 Epoch: 4 Global Step: 25090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:07,948-Speed 3448.77 samples/sec Loss 10.9885 LearningRate 0.1285 Epoch: 4 Global Step: 25100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:10,882-Speed 3492.33 samples/sec Loss 10.7989 LearningRate 0.1285 Epoch: 4 Global Step: 25110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:13,894-Speed 3399.99 samples/sec Loss 10.8333 LearningRate 0.1284 Epoch: 4 Global Step: 25120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:16,830-Speed 3488.52 samples/sec Loss 10.7789 LearningRate 0.1284 Epoch: 4 Global Step: 25130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:19,770-Speed 3484.39 samples/sec Loss 10.8627 LearningRate 0.1284 Epoch: 4 Global Step: 25140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:22,751-Speed 3435.72 samples/sec Loss 10.6700 LearningRate 0.1284 Epoch: 4 Global Step: 25150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:25,695-Speed 3479.48 samples/sec Loss 10.9739 LearningRate 0.1283 Epoch: 4 Global Step: 25160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:28,636-Speed 3483.45 samples/sec Loss 11.0834 LearningRate 0.1283 Epoch: 4 Global Step: 25170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:31,594-Speed 3462.92 samples/sec Loss 10.9570 LearningRate 0.1283 Epoch: 4 Global Step: 25180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:34,532-Speed 3486.42 samples/sec Loss 10.8559 LearningRate 0.1283 Epoch: 4 Global Step: 25190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:37,476-Speed 3479.17 samples/sec Loss 10.8444 LearningRate 0.1282 Epoch: 4 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:40,408-Speed 3494.69 samples/sec Loss 10.7149 LearningRate 0.1282 Epoch: 4 Global Step: 25210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:43,341-Speed 3491.06 samples/sec Loss 10.8231 LearningRate 0.1282 Epoch: 4 Global Step: 25220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:46,274-Speed 3492.50 samples/sec Loss 10.8610 LearningRate 0.1282 Epoch: 4 Global Step: 25230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:49,211-Speed 3487.16 samples/sec Loss 10.7653 LearningRate 0.1281 Epoch: 4 Global Step: 25240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:52,194-Speed 3434.43 samples/sec Loss 10.7879 LearningRate 0.1281 Epoch: 4 Global Step: 25250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:55,132-Speed 3486.69 samples/sec Loss 10.9479 LearningRate 0.1281 Epoch: 4 Global Step: 25260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:34:58,103-Speed 3447.87 samples/sec Loss 10.7420 LearningRate 0.1281 Epoch: 4 Global Step: 25270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:01,115-Speed 3400.88 samples/sec Loss 11.0607 LearningRate 0.1280 Epoch: 4 Global Step: 25280 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:35:04,039-Speed 3502.91 samples/sec Loss 11.0049 LearningRate 0.1280 Epoch: 4 Global Step: 25290 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:35:16,048-Speed 852.75 samples/sec Loss 10.0359 LearningRate 0.1280 Epoch: 5 Global Step: 25300 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:35:18,973-Speed 3502.48 samples/sec Loss 10.1937 LearningRate 0.1279 Epoch: 5 Global Step: 25310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:21,931-Speed 3462.73 samples/sec Loss 10.1824 LearningRate 0.1279 Epoch: 5 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:24,884-Speed 3468.64 samples/sec Loss 10.1042 LearningRate 0.1279 Epoch: 5 Global Step: 25330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:27,821-Speed 3488.23 samples/sec Loss 10.0405 LearningRate 0.1279 Epoch: 5 Global Step: 25340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:30,754-Speed 3491.67 samples/sec Loss 10.1674 LearningRate 0.1278 Epoch: 5 Global Step: 25350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:33,693-Speed 3484.88 samples/sec Loss 10.2799 LearningRate 0.1278 Epoch: 5 Global Step: 25360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:36,631-Speed 3486.44 samples/sec Loss 10.1644 LearningRate 0.1278 Epoch: 5 Global Step: 25370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:39,565-Speed 3491.10 samples/sec Loss 10.3463 LearningRate 0.1278 Epoch: 5 Global Step: 25380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:42,517-Speed 3469.66 samples/sec Loss 10.2011 LearningRate 0.1277 Epoch: 5 Global Step: 25390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:45,467-Speed 3472.51 samples/sec Loss 10.3092 LearningRate 0.1277 Epoch: 5 Global Step: 25400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:48,408-Speed 3482.18 samples/sec Loss 10.2062 LearningRate 0.1277 Epoch: 5 Global Step: 25410 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:35:51,343-Speed 3490.26 samples/sec Loss 10.1593 LearningRate 0.1277 Epoch: 5 Global Step: 25420 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:35:54,312-Speed 3450.37 samples/sec Loss 10.0949 LearningRate 0.1276 Epoch: 5 Global Step: 25430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:35:57,256-Speed 3478.88 samples/sec Loss 10.2015 LearningRate 0.1276 Epoch: 5 Global Step: 25440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:00,217-Speed 3459.62 samples/sec Loss 10.2250 LearningRate 0.1276 Epoch: 5 Global Step: 25450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:03,242-Speed 3385.80 samples/sec Loss 10.5239 LearningRate 0.1276 Epoch: 5 Global Step: 25460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:06,183-Speed 3483.42 samples/sec Loss 10.3925 LearningRate 0.1275 Epoch: 5 Global Step: 25470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:09,129-Speed 3476.23 samples/sec Loss 10.3528 LearningRate 0.1275 Epoch: 5 Global Step: 25480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:12,074-Speed 3478.58 samples/sec Loss 10.3631 LearningRate 0.1275 Epoch: 5 Global Step: 25490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:15,022-Speed 3473.86 samples/sec Loss 10.2714 LearningRate 0.1275 Epoch: 5 Global Step: 25500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:17,971-Speed 3472.56 samples/sec Loss 10.2530 LearningRate 0.1274 Epoch: 5 Global Step: 25510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:20,963-Speed 3424.16 samples/sec Loss 10.2971 LearningRate 0.1274 Epoch: 5 Global Step: 25520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:23,903-Speed 3483.57 samples/sec Loss 10.2416 LearningRate 0.1274 Epoch: 5 Global Step: 25530 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:36:26,831-Speed 3499.21 samples/sec Loss 10.3314 LearningRate 0.1274 Epoch: 5 Global Step: 25540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:29,771-Speed 3482.97 samples/sec Loss 10.2935 LearningRate 0.1273 Epoch: 5 Global Step: 25550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:32,707-Speed 3489.21 samples/sec Loss 10.1778 LearningRate 0.1273 Epoch: 5 Global Step: 25560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:35,695-Speed 3428.01 samples/sec Loss 10.2411 LearningRate 0.1273 Epoch: 5 Global Step: 25570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:38,639-Speed 3479.47 samples/sec Loss 10.4133 LearningRate 0.1273 Epoch: 5 Global Step: 25580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:41,592-Speed 3469.04 samples/sec Loss 10.4734 LearningRate 0.1272 Epoch: 5 Global Step: 25590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:44,625-Speed 3377.05 samples/sec Loss 10.4341 LearningRate 0.1272 Epoch: 5 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:47,612-Speed 3428.26 samples/sec Loss 10.5659 LearningRate 0.1272 Epoch: 5 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:50,555-Speed 3480.81 samples/sec Loss 10.5212 LearningRate 0.1272 Epoch: 5 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:53,491-Speed 3489.29 samples/sec Loss 10.3765 LearningRate 0.1271 Epoch: 5 Global Step: 25630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:36:56,474-Speed 3433.47 samples/sec Loss 10.4041 LearningRate 0.1271 Epoch: 5 Global Step: 25640 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:36:59,408-Speed 3490.36 samples/sec Loss 10.5905 LearningRate 0.1271 Epoch: 5 Global Step: 25650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:37:02,386-Speed 3440.32 samples/sec Loss 10.4607 LearningRate 0.1271 Epoch: 5 Global Step: 25660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:37:05,319-Speed 3491.78 samples/sec Loss 10.5795 LearningRate 0.1270 Epoch: 5 Global Step: 25670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:37:08,269-Speed 3472.66 samples/sec Loss 10.4565 LearningRate 0.1270 Epoch: 5 Global Step: 25680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:37:11,201-Speed 3493.47 samples/sec Loss 10.5697 LearningRate 0.1270 Epoch: 5 Global Step: 25690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:37:14,134-Speed 3492.09 samples/sec Loss 10.6505 LearningRate 0.1270 Epoch: 5 Global Step: 25700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:37:17,073-Speed 3484.63 samples/sec Loss 10.7159 LearningRate 0.1269 Epoch: 5 Global Step: 25710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:37:20,001-Speed 3497.84 samples/sec Loss 10.4069 LearningRate 0.1269 Epoch: 5 Global Step: 25720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:37:22,944-Speed 3480.75 samples/sec Loss 10.5502 LearningRate 0.1269 Epoch: 5 Global Step: 25730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:37:25,864-Speed 3508.18 samples/sec Loss 10.5540 LearningRate 0.1269 Epoch: 5 Global Step: 25740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:37:28,810-Speed 3476.22 samples/sec Loss 10.4474 LearningRate 0.1268 Epoch: 5 Global Step: 25750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:37:31,816-Speed 3407.98 samples/sec Loss 10.5929 LearningRate 0.1268 Epoch: 5 Global Step: 25760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:37:34,841-Speed 3386.44 samples/sec Loss 10.3618 LearningRate 0.1268 Epoch: 5 Global Step: 25770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:37:37,783-Speed 3481.23 samples/sec Loss 10.5234 LearningRate 0.1268 Epoch: 5 Global Step: 25780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:37:40,721-Speed 3486.03 samples/sec Loss 10.5623 LearningRate 0.1267 Epoch: 5 Global Step: 25790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:37:43,658-Speed 3487.90 samples/sec Loss 10.5848 LearningRate 0.1267 Epoch: 5 Global Step: 25800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:37:46,626-Speed 3450.64 samples/sec Loss 10.6099 LearningRate 0.1267 Epoch: 5 Global Step: 25810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:37:49,621-Speed 3420.39 samples/sec Loss 10.6373 LearningRate 0.1267 Epoch: 5 Global Step: 25820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:37:52,589-Speed 3450.90 samples/sec Loss 10.5155 LearningRate 0.1266 Epoch: 5 Global Step: 25830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:37:55,533-Speed 3479.51 samples/sec Loss 10.6354 LearningRate 0.1266 Epoch: 5 Global Step: 25840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:37:58,464-Speed 3495.10 samples/sec Loss 10.4576 LearningRate 0.1266 Epoch: 5 Global Step: 25850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:01,466-Speed 3412.03 samples/sec Loss 10.6461 LearningRate 0.1266 Epoch: 5 Global Step: 25860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:04,399-Speed 3491.88 samples/sec Loss 10.3822 LearningRate 0.1265 Epoch: 5 Global Step: 25870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:07,376-Speed 3440.85 samples/sec Loss 10.6938 LearningRate 0.1265 Epoch: 5 Global Step: 25880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:10,314-Speed 3485.65 samples/sec Loss 10.7683 LearningRate 0.1265 Epoch: 5 Global Step: 25890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:13,251-Speed 3487.66 samples/sec Loss 10.4903 LearningRate 0.1265 Epoch: 5 Global Step: 25900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:16,202-Speed 3471.45 samples/sec Loss 10.6416 LearningRate 0.1264 Epoch: 5 Global Step: 25910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:19,131-Speed 3497.37 samples/sec Loss 10.7027 LearningRate 0.1264 Epoch: 5 Global Step: 25920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:22,067-Speed 3489.44 samples/sec Loss 10.5842 LearningRate 0.1264 Epoch: 5 Global Step: 25930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:25,015-Speed 3473.31 samples/sec Loss 10.4702 LearningRate 0.1264 Epoch: 5 Global Step: 25940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:27,949-Speed 3491.02 samples/sec Loss 10.6487 LearningRate 0.1263 Epoch: 5 Global Step: 25950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:30,892-Speed 3481.02 samples/sec Loss 10.6666 LearningRate 0.1263 Epoch: 5 Global Step: 25960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:33,865-Speed 3444.83 samples/sec Loss 10.6795 LearningRate 0.1263 Epoch: 5 Global Step: 25970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:36,833-Speed 3451.31 samples/sec Loss 10.5551 LearningRate 0.1263 Epoch: 5 Global Step: 25980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:39,781-Speed 3474.56 samples/sec Loss 10.6042 LearningRate 0.1262 Epoch: 5 Global Step: 25990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:38:42,732-Speed 3470.92 samples/sec Loss 10.6437 LearningRate 0.1262 Epoch: 5 Global Step: 26000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:39:26,188-[lfw][26000]XNorm: 21.951595 Training: 2022-01-19 19:39:26,188-[lfw][26000]Accuracy-Flip: 0.99600+-0.00335 Training: 2022-01-19 19:39:26,189-[lfw][26000]Accuracy-Highest: 0.99700 Training: 2022-01-19 19:40:16,562-[cfp_fp][26000]XNorm: 18.648764 Training: 2022-01-19 19:40:16,563-[cfp_fp][26000]Accuracy-Flip: 0.95157+-0.01181 Training: 2022-01-19 19:40:16,564-[cfp_fp][26000]Accuracy-Highest: 0.95157 Training: 2022-01-19 19:40:59,929-[agedb_30][26000]XNorm: 21.548375 Training: 2022-01-19 19:40:59,930-[agedb_30][26000]Accuracy-Flip: 0.96983+-0.00886 Training: 2022-01-19 19:40:59,930-[agedb_30][26000]Accuracy-Highest: 0.96983 Training: 2022-01-19 19:41:02,860-Speed 73.08 samples/sec Loss 10.5383 LearningRate 0.1262 Epoch: 5 Global Step: 26010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:41:05,816-Speed 3464.54 samples/sec Loss 10.4941 LearningRate 0.1262 Epoch: 5 Global Step: 26020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:41:08,755-Speed 3484.29 samples/sec Loss 10.8241 LearningRate 0.1261 Epoch: 5 Global Step: 26030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:41:11,703-Speed 3474.91 samples/sec Loss 10.6564 LearningRate 0.1261 Epoch: 5 Global Step: 26040 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:41:14,614-Speed 3518.78 samples/sec Loss 10.6213 LearningRate 0.1261 Epoch: 5 Global Step: 26050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:41:17,551-Speed 3488.29 samples/sec Loss 10.6299 LearningRate 0.1261 Epoch: 5 Global Step: 26060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:41:20,479-Speed 3498.29 samples/sec Loss 10.5633 LearningRate 0.1260 Epoch: 5 Global Step: 26070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:41:23,424-Speed 3476.80 samples/sec Loss 10.5656 LearningRate 0.1260 Epoch: 5 Global Step: 26080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:41:26,354-Speed 3496.21 samples/sec Loss 10.8099 LearningRate 0.1260 Epoch: 5 Global Step: 26090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:41:29,290-Speed 3489.11 samples/sec Loss 10.6120 LearningRate 0.1260 Epoch: 5 Global Step: 26100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:41:32,226-Speed 3488.92 samples/sec Loss 10.7346 LearningRate 0.1259 Epoch: 5 Global Step: 26110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:41:35,163-Speed 3486.43 samples/sec Loss 10.7370 LearningRate 0.1259 Epoch: 5 Global Step: 26120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:41:38,139-Speed 3441.70 samples/sec Loss 10.6400 LearningRate 0.1259 Epoch: 5 Global Step: 26130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:41:41,115-Speed 3442.81 samples/sec Loss 10.6348 LearningRate 0.1259 Epoch: 5 Global Step: 26140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:41:44,094-Speed 3437.92 samples/sec Loss 10.3896 LearningRate 0.1258 Epoch: 5 Global Step: 26150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:41:47,053-Speed 3462.26 samples/sec Loss 10.5952 LearningRate 0.1258 Epoch: 5 Global Step: 26160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:41:49,979-Speed 3500.29 samples/sec Loss 10.5823 LearningRate 0.1258 Epoch: 5 Global Step: 26170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:41:52,903-Speed 3502.82 samples/sec Loss 10.7145 LearningRate 0.1258 Epoch: 5 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:41:55,849-Speed 3477.69 samples/sec Loss 10.5999 LearningRate 0.1257 Epoch: 5 Global Step: 26190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:41:58,775-Speed 3499.96 samples/sec Loss 10.6581 LearningRate 0.1257 Epoch: 5 Global Step: 26200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:42:01,705-Speed 3496.52 samples/sec Loss 10.6063 LearningRate 0.1257 Epoch: 5 Global Step: 26210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:42:04,721-Speed 3395.02 samples/sec Loss 10.5695 LearningRate 0.1257 Epoch: 5 Global Step: 26220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:42:07,712-Speed 3425.90 samples/sec Loss 10.6800 LearningRate 0.1256 Epoch: 5 Global Step: 26230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:10,648-Speed 3488.21 samples/sec Loss 10.5762 LearningRate 0.1256 Epoch: 5 Global Step: 26240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:13,590-Speed 3481.33 samples/sec Loss 10.5164 LearningRate 0.1256 Epoch: 5 Global Step: 26250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:16,557-Speed 3452.44 samples/sec Loss 10.6764 LearningRate 0.1256 Epoch: 5 Global Step: 26260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:19,553-Speed 3419.06 samples/sec Loss 10.6366 LearningRate 0.1255 Epoch: 5 Global Step: 26270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:22,488-Speed 3490.56 samples/sec Loss 10.6258 LearningRate 0.1255 Epoch: 5 Global Step: 26280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:25,469-Speed 3435.89 samples/sec Loss 10.6576 LearningRate 0.1255 Epoch: 5 Global Step: 26290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:28,440-Speed 3447.13 samples/sec Loss 10.7514 LearningRate 0.1255 Epoch: 5 Global Step: 26300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:31,382-Speed 3481.81 samples/sec Loss 10.6292 LearningRate 0.1254 Epoch: 5 Global Step: 26310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:34,317-Speed 3489.54 samples/sec Loss 10.6451 LearningRate 0.1254 Epoch: 5 Global Step: 26320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:37,252-Speed 3490.17 samples/sec Loss 10.5909 LearningRate 0.1254 Epoch: 5 Global Step: 26330 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:42:40,180-Speed 3498.44 samples/sec Loss 10.5426 LearningRate 0.1254 Epoch: 5 Global Step: 26340 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:42:43,118-Speed 3486.03 samples/sec Loss 10.5877 LearningRate 0.1253 Epoch: 5 Global Step: 26350 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:42:46,068-Speed 3473.42 samples/sec Loss 10.8191 LearningRate 0.1253 Epoch: 5 Global Step: 26360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:48,994-Speed 3500.77 samples/sec Loss 10.5229 LearningRate 0.1253 Epoch: 5 Global Step: 26370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:51,960-Speed 3454.90 samples/sec Loss 10.6239 LearningRate 0.1253 Epoch: 5 Global Step: 26380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:54,888-Speed 3497.23 samples/sec Loss 10.5204 LearningRate 0.1252 Epoch: 5 Global Step: 26390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:42:57,820-Speed 3493.43 samples/sec Loss 10.5123 LearningRate 0.1252 Epoch: 5 Global Step: 26400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:00,835-Speed 3398.17 samples/sec Loss 10.6560 LearningRate 0.1252 Epoch: 5 Global Step: 26410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:03,764-Speed 3496.70 samples/sec Loss 10.5163 LearningRate 0.1252 Epoch: 5 Global Step: 26420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:06,711-Speed 3476.09 samples/sec Loss 10.5346 LearningRate 0.1251 Epoch: 5 Global Step: 26430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:09,640-Speed 3496.71 samples/sec Loss 10.5294 LearningRate 0.1251 Epoch: 5 Global Step: 26440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:12,638-Speed 3417.00 samples/sec Loss 10.5793 LearningRate 0.1251 Epoch: 5 Global Step: 26450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:15,692-Speed 3353.53 samples/sec Loss 10.6328 LearningRate 0.1251 Epoch: 5 Global Step: 26460 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:43:18,666-Speed 3443.77 samples/sec Loss 10.4670 LearningRate 0.1250 Epoch: 5 Global Step: 26470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:21,614-Speed 3475.77 samples/sec Loss 10.6281 LearningRate 0.1250 Epoch: 5 Global Step: 26480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:24,597-Speed 3433.21 samples/sec Loss 10.6859 LearningRate 0.1250 Epoch: 5 Global Step: 26490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:27,567-Speed 3452.74 samples/sec Loss 10.8345 LearningRate 0.1250 Epoch: 5 Global Step: 26500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:30,496-Speed 3497.36 samples/sec Loss 10.7340 LearningRate 0.1249 Epoch: 5 Global Step: 26510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:33,433-Speed 3487.54 samples/sec Loss 10.5925 LearningRate 0.1249 Epoch: 5 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:36,366-Speed 3492.01 samples/sec Loss 10.7623 LearningRate 0.1249 Epoch: 5 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:39,335-Speed 3450.51 samples/sec Loss 10.5599 LearningRate 0.1249 Epoch: 5 Global Step: 26540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:42,284-Speed 3473.82 samples/sec Loss 10.6551 LearningRate 0.1248 Epoch: 5 Global Step: 26550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:45,206-Speed 3504.90 samples/sec Loss 10.4783 LearningRate 0.1248 Epoch: 5 Global Step: 26560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:48,132-Speed 3499.74 samples/sec Loss 10.7335 LearningRate 0.1248 Epoch: 5 Global Step: 26570 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:43:51,048-Speed 3512.74 samples/sec Loss 10.6620 LearningRate 0.1248 Epoch: 5 Global Step: 26580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:53,984-Speed 3488.60 samples/sec Loss 10.6063 LearningRate 0.1247 Epoch: 5 Global Step: 26590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:56,913-Speed 3496.80 samples/sec Loss 10.7683 LearningRate 0.1247 Epoch: 5 Global Step: 26600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:43:59,845-Speed 3494.04 samples/sec Loss 10.7993 LearningRate 0.1247 Epoch: 5 Global Step: 26610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:02,789-Speed 3479.31 samples/sec Loss 10.7090 LearningRate 0.1247 Epoch: 5 Global Step: 26620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:05,747-Speed 3462.99 samples/sec Loss 10.5260 LearningRate 0.1246 Epoch: 5 Global Step: 26630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:08,768-Speed 3390.62 samples/sec Loss 10.7560 LearningRate 0.1246 Epoch: 5 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:11,692-Speed 3503.24 samples/sec Loss 10.7134 LearningRate 0.1246 Epoch: 5 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:14,622-Speed 3495.40 samples/sec Loss 10.6088 LearningRate 0.1246 Epoch: 5 Global Step: 26660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:17,549-Speed 3499.79 samples/sec Loss 10.6844 LearningRate 0.1245 Epoch: 5 Global Step: 26670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:20,506-Speed 3463.58 samples/sec Loss 10.7906 LearningRate 0.1245 Epoch: 5 Global Step: 26680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:23,444-Speed 3486.16 samples/sec Loss 10.6894 LearningRate 0.1245 Epoch: 5 Global Step: 26690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:26,384-Speed 3484.63 samples/sec Loss 10.5970 LearningRate 0.1245 Epoch: 5 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:29,326-Speed 3481.73 samples/sec Loss 10.4866 LearningRate 0.1244 Epoch: 5 Global Step: 26710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:32,257-Speed 3495.58 samples/sec Loss 10.4695 LearningRate 0.1244 Epoch: 5 Global Step: 26720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:35,186-Speed 3495.99 samples/sec Loss 10.5660 LearningRate 0.1244 Epoch: 5 Global Step: 26730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:38,198-Speed 3400.81 samples/sec Loss 10.5197 LearningRate 0.1244 Epoch: 5 Global Step: 26740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:41,153-Speed 3466.14 samples/sec Loss 10.5325 LearningRate 0.1243 Epoch: 5 Global Step: 26750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:44,081-Speed 3497.97 samples/sec Loss 10.5172 LearningRate 0.1243 Epoch: 5 Global Step: 26760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:47,037-Speed 3496.52 samples/sec Loss 10.7058 LearningRate 0.1243 Epoch: 5 Global Step: 26770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:49,963-Speed 3501.12 samples/sec Loss 10.4221 LearningRate 0.1243 Epoch: 5 Global Step: 26780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:53,025-Speed 3396.56 samples/sec Loss 10.8090 LearningRate 0.1242 Epoch: 5 Global Step: 26790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:55,951-Speed 3500.54 samples/sec Loss 10.5504 LearningRate 0.1242 Epoch: 5 Global Step: 26800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:44:58,880-Speed 3497.60 samples/sec Loss 10.5474 LearningRate 0.1242 Epoch: 5 Global Step: 26810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:01,850-Speed 3501.70 samples/sec Loss 10.6767 LearningRate 0.1242 Epoch: 5 Global Step: 26820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:04,845-Speed 3418.89 samples/sec Loss 10.6172 LearningRate 0.1241 Epoch: 5 Global Step: 26830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:07,851-Speed 3496.36 samples/sec Loss 10.5360 LearningRate 0.1241 Epoch: 5 Global Step: 26840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:10,780-Speed 3497.46 samples/sec Loss 10.6118 LearningRate 0.1241 Epoch: 5 Global Step: 26850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:13,778-Speed 3416.38 samples/sec Loss 10.5452 LearningRate 0.1241 Epoch: 5 Global Step: 26860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:16,706-Speed 3498.16 samples/sec Loss 10.6935 LearningRate 0.1240 Epoch: 5 Global Step: 26870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:19,667-Speed 3458.88 samples/sec Loss 10.6838 LearningRate 0.1240 Epoch: 5 Global Step: 26880 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:45:22,625-Speed 3463.78 samples/sec Loss 10.5536 LearningRate 0.1240 Epoch: 5 Global Step: 26890 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:45:25,616-Speed 3424.75 samples/sec Loss 10.6264 LearningRate 0.1240 Epoch: 5 Global Step: 26900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:28,550-Speed 3490.52 samples/sec Loss 10.6820 LearningRate 0.1239 Epoch: 5 Global Step: 26910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:31,547-Speed 3417.55 samples/sec Loss 10.5703 LearningRate 0.1239 Epoch: 5 Global Step: 26920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:34,546-Speed 3415.20 samples/sec Loss 10.4954 LearningRate 0.1239 Epoch: 5 Global Step: 26930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:37,500-Speed 3468.28 samples/sec Loss 10.6123 LearningRate 0.1239 Epoch: 5 Global Step: 26940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:40,490-Speed 3425.36 samples/sec Loss 10.6347 LearningRate 0.1238 Epoch: 5 Global Step: 26950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:43,497-Speed 3406.19 samples/sec Loss 10.4606 LearningRate 0.1238 Epoch: 5 Global Step: 26960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:46,494-Speed 3417.18 samples/sec Loss 10.3972 LearningRate 0.1238 Epoch: 5 Global Step: 26970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:49,465-Speed 3448.25 samples/sec Loss 10.3861 LearningRate 0.1238 Epoch: 5 Global Step: 26980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:52,415-Speed 3472.82 samples/sec Loss 10.4069 LearningRate 0.1237 Epoch: 5 Global Step: 26990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:45:55,358-Speed 3479.37 samples/sec Loss 10.5567 LearningRate 0.1237 Epoch: 5 Global Step: 27000 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:45:58,335-Speed 3441.55 samples/sec Loss 10.5586 LearningRate 0.1237 Epoch: 5 Global Step: 27010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:01,365-Speed 3380.10 samples/sec Loss 10.4722 LearningRate 0.1237 Epoch: 5 Global Step: 27020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:04,297-Speed 3493.17 samples/sec Loss 10.4157 LearningRate 0.1236 Epoch: 5 Global Step: 27030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:07,246-Speed 3473.78 samples/sec Loss 10.4958 LearningRate 0.1236 Epoch: 5 Global Step: 27040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:10,226-Speed 3436.91 samples/sec Loss 10.6378 LearningRate 0.1236 Epoch: 5 Global Step: 27050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:13,174-Speed 3474.81 samples/sec Loss 10.6588 LearningRate 0.1236 Epoch: 5 Global Step: 27060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:16,119-Speed 3477.69 samples/sec Loss 10.5259 LearningRate 0.1235 Epoch: 5 Global Step: 27070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:19,064-Speed 3478.10 samples/sec Loss 10.7394 LearningRate 0.1235 Epoch: 5 Global Step: 27080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:22,042-Speed 3439.30 samples/sec Loss 10.5361 LearningRate 0.1235 Epoch: 5 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:25,011-Speed 3450.10 samples/sec Loss 10.5205 LearningRate 0.1235 Epoch: 5 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:27,943-Speed 3493.86 samples/sec Loss 10.5540 LearningRate 0.1234 Epoch: 5 Global Step: 27110 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:46:30,869-Speed 3500.37 samples/sec Loss 10.8149 LearningRate 0.1234 Epoch: 5 Global Step: 27120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:33,890-Speed 3389.69 samples/sec Loss 10.7148 LearningRate 0.1234 Epoch: 5 Global Step: 27130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:36,827-Speed 3488.45 samples/sec Loss 10.6731 LearningRate 0.1234 Epoch: 5 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:39,759-Speed 3492.62 samples/sec Loss 10.5152 LearningRate 0.1233 Epoch: 5 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:42,726-Speed 3453.12 samples/sec Loss 10.7358 LearningRate 0.1233 Epoch: 5 Global Step: 27160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:45,765-Speed 3370.46 samples/sec Loss 10.6383 LearningRate 0.1233 Epoch: 5 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:48,760-Speed 3419.41 samples/sec Loss 10.7370 LearningRate 0.1233 Epoch: 5 Global Step: 27180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:51,698-Speed 3487.24 samples/sec Loss 10.4780 LearningRate 0.1232 Epoch: 5 Global Step: 27190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:54,656-Speed 3461.72 samples/sec Loss 10.4649 LearningRate 0.1232 Epoch: 5 Global Step: 27200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:46:57,585-Speed 3497.44 samples/sec Loss 10.3694 LearningRate 0.1232 Epoch: 5 Global Step: 27210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:00,503-Speed 3510.06 samples/sec Loss 10.5930 LearningRate 0.1232 Epoch: 5 Global Step: 27220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:03,442-Speed 3485.42 samples/sec Loss 10.3968 LearningRate 0.1231 Epoch: 5 Global Step: 27230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:06,377-Speed 3489.38 samples/sec Loss 10.6359 LearningRate 0.1231 Epoch: 5 Global Step: 27240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:09,327-Speed 3472.22 samples/sec Loss 10.6121 LearningRate 0.1231 Epoch: 5 Global Step: 27250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:12,256-Speed 3498.53 samples/sec Loss 10.7905 LearningRate 0.1231 Epoch: 5 Global Step: 27260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:15,203-Speed 3475.85 samples/sec Loss 10.3626 LearningRate 0.1230 Epoch: 5 Global Step: 27270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:18,141-Speed 3486.97 samples/sec Loss 10.4796 LearningRate 0.1230 Epoch: 5 Global Step: 27280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:21,074-Speed 3491.32 samples/sec Loss 10.4729 LearningRate 0.1230 Epoch: 5 Global Step: 27290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:24,019-Speed 3478.30 samples/sec Loss 10.6977 LearningRate 0.1230 Epoch: 5 Global Step: 27300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:26,979-Speed 3460.89 samples/sec Loss 10.6340 LearningRate 0.1229 Epoch: 5 Global Step: 27310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:29,958-Speed 3437.87 samples/sec Loss 10.6101 LearningRate 0.1229 Epoch: 5 Global Step: 27320 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:47:32,878-Speed 3510.16 samples/sec Loss 10.4237 LearningRate 0.1229 Epoch: 5 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:35,885-Speed 3406.30 samples/sec Loss 10.3764 LearningRate 0.1229 Epoch: 5 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:38,825-Speed 3483.29 samples/sec Loss 10.5112 LearningRate 0.1228 Epoch: 5 Global Step: 27350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:41,772-Speed 3477.14 samples/sec Loss 10.5297 LearningRate 0.1228 Epoch: 5 Global Step: 27360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:44,729-Speed 3463.72 samples/sec Loss 10.6011 LearningRate 0.1228 Epoch: 5 Global Step: 27370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:47,674-Speed 3478.84 samples/sec Loss 10.4730 LearningRate 0.1228 Epoch: 5 Global Step: 27380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:50,652-Speed 3438.69 samples/sec Loss 10.2967 LearningRate 0.1227 Epoch: 5 Global Step: 27390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:53,619-Speed 3452.45 samples/sec Loss 10.6166 LearningRate 0.1227 Epoch: 5 Global Step: 27400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:56,615-Speed 3419.14 samples/sec Loss 10.3052 LearningRate 0.1227 Epoch: 5 Global Step: 27410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:47:59,543-Speed 3498.01 samples/sec Loss 10.3879 LearningRate 0.1227 Epoch: 5 Global Step: 27420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:02,541-Speed 3416.95 samples/sec Loss 10.5420 LearningRate 0.1226 Epoch: 5 Global Step: 27430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:05,538-Speed 3418.08 samples/sec Loss 10.4874 LearningRate 0.1226 Epoch: 5 Global Step: 27440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:08,499-Speed 3460.28 samples/sec Loss 10.4409 LearningRate 0.1226 Epoch: 5 Global Step: 27450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:11,435-Speed 3488.67 samples/sec Loss 10.4355 LearningRate 0.1226 Epoch: 5 Global Step: 27460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:14,404-Speed 3449.47 samples/sec Loss 10.5077 LearningRate 0.1225 Epoch: 5 Global Step: 27470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:17,341-Speed 3487.71 samples/sec Loss 10.5252 LearningRate 0.1225 Epoch: 5 Global Step: 27480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:20,271-Speed 3495.62 samples/sec Loss 10.5921 LearningRate 0.1225 Epoch: 5 Global Step: 27490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:23,207-Speed 3488.74 samples/sec Loss 10.3127 LearningRate 0.1225 Epoch: 5 Global Step: 27500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:26,157-Speed 3472.36 samples/sec Loss 10.5418 LearningRate 0.1224 Epoch: 5 Global Step: 27510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:29,106-Speed 3473.42 samples/sec Loss 10.6309 LearningRate 0.1224 Epoch: 5 Global Step: 27520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:32,105-Speed 3415.58 samples/sec Loss 10.5712 LearningRate 0.1224 Epoch: 5 Global Step: 27530 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:48:35,068-Speed 3456.90 samples/sec Loss 10.3883 LearningRate 0.1224 Epoch: 5 Global Step: 27540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:38,024-Speed 3465.40 samples/sec Loss 10.4427 LearningRate 0.1223 Epoch: 5 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:40,964-Speed 3483.98 samples/sec Loss 10.5098 LearningRate 0.1223 Epoch: 5 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:43,940-Speed 3441.74 samples/sec Loss 10.4800 LearningRate 0.1223 Epoch: 5 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:46,907-Speed 3452.81 samples/sec Loss 10.2126 LearningRate 0.1223 Epoch: 5 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:49,858-Speed 3470.28 samples/sec Loss 10.4926 LearningRate 0.1222 Epoch: 5 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:52,794-Speed 3489.13 samples/sec Loss 10.5444 LearningRate 0.1222 Epoch: 5 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:55,816-Speed 3389.38 samples/sec Loss 10.5015 LearningRate 0.1222 Epoch: 5 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:48:58,759-Speed 3480.15 samples/sec Loss 10.4345 LearningRate 0.1222 Epoch: 5 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:49:01,755-Speed 3418.21 samples/sec Loss 10.3984 LearningRate 0.1221 Epoch: 5 Global Step: 27630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:49:04,685-Speed 3496.46 samples/sec Loss 10.6169 LearningRate 0.1221 Epoch: 5 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:49:07,647-Speed 3459.90 samples/sec Loss 10.3405 LearningRate 0.1221 Epoch: 5 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:49:10,643-Speed 3418.66 samples/sec Loss 10.4931 LearningRate 0.1221 Epoch: 5 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:49:13,581-Speed 3486.79 samples/sec Loss 10.3028 LearningRate 0.1220 Epoch: 5 Global Step: 27670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:49:16,527-Speed 3476.39 samples/sec Loss 10.5172 LearningRate 0.1220 Epoch: 5 Global Step: 27680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:49:19,486-Speed 3460.54 samples/sec Loss 10.4959 LearningRate 0.1220 Epoch: 5 Global Step: 27690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:49:22,444-Speed 3464.41 samples/sec Loss 10.7176 LearningRate 0.1220 Epoch: 5 Global Step: 27700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:49:25,373-Speed 3496.69 samples/sec Loss 10.4551 LearningRate 0.1219 Epoch: 5 Global Step: 27710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:49:28,408-Speed 3375.56 samples/sec Loss 10.5038 LearningRate 0.1219 Epoch: 5 Global Step: 27720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:49:31,349-Speed 3481.84 samples/sec Loss 10.5188 LearningRate 0.1219 Epoch: 5 Global Step: 27730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:49:34,384-Speed 3375.43 samples/sec Loss 10.4731 LearningRate 0.1219 Epoch: 5 Global Step: 27740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:49:37,388-Speed 3410.06 samples/sec Loss 10.4565 LearningRate 0.1219 Epoch: 5 Global Step: 27750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:49:40,319-Speed 3494.05 samples/sec Loss 10.4031 LearningRate 0.1218 Epoch: 5 Global Step: 27760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:49:43,251-Speed 3493.28 samples/sec Loss 10.5399 LearningRate 0.1218 Epoch: 5 Global Step: 27770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:49:46,186-Speed 3490.68 samples/sec Loss 10.5315 LearningRate 0.1218 Epoch: 5 Global Step: 27780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:49:49,129-Speed 3479.96 samples/sec Loss 10.3824 LearningRate 0.1218 Epoch: 5 Global Step: 27790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:49:52,068-Speed 3485.60 samples/sec Loss 10.3567 LearningRate 0.1217 Epoch: 5 Global Step: 27800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:49:55,012-Speed 3478.04 samples/sec Loss 10.3891 LearningRate 0.1217 Epoch: 5 Global Step: 27810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:49:57,945-Speed 3493.04 samples/sec Loss 10.4888 LearningRate 0.1217 Epoch: 5 Global Step: 27820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:00,880-Speed 3490.70 samples/sec Loss 10.7163 LearningRate 0.1217 Epoch: 5 Global Step: 27830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:03,808-Speed 3498.08 samples/sec Loss 10.4500 LearningRate 0.1216 Epoch: 5 Global Step: 27840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:06,743-Speed 3490.25 samples/sec Loss 10.6500 LearningRate 0.1216 Epoch: 5 Global Step: 27850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:09,667-Speed 3502.06 samples/sec Loss 10.5532 LearningRate 0.1216 Epoch: 5 Global Step: 27860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:12,605-Speed 3487.64 samples/sec Loss 10.5392 LearningRate 0.1216 Epoch: 5 Global Step: 27870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:15,543-Speed 3485.18 samples/sec Loss 10.4557 LearningRate 0.1215 Epoch: 5 Global Step: 27880 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:50:18,479-Speed 3489.59 samples/sec Loss 10.4993 LearningRate 0.1215 Epoch: 5 Global Step: 27890 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:50:21,437-Speed 3462.14 samples/sec Loss 10.5902 LearningRate 0.1215 Epoch: 5 Global Step: 27900 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:50:24,356-Speed 3508.58 samples/sec Loss 10.4268 LearningRate 0.1215 Epoch: 5 Global Step: 27910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:27,291-Speed 3490.35 samples/sec Loss 10.4129 LearningRate 0.1214 Epoch: 5 Global Step: 27920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:30,243-Speed 3470.32 samples/sec Loss 10.5635 LearningRate 0.1214 Epoch: 5 Global Step: 27930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:33,288-Speed 3364.32 samples/sec Loss 10.4263 LearningRate 0.1214 Epoch: 5 Global Step: 27940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:36,253-Speed 3453.57 samples/sec Loss 10.3277 LearningRate 0.1214 Epoch: 5 Global Step: 27950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:39,187-Speed 3491.46 samples/sec Loss 10.6946 LearningRate 0.1213 Epoch: 5 Global Step: 27960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:42,134-Speed 3475.70 samples/sec Loss 10.6156 LearningRate 0.1213 Epoch: 5 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:45,076-Speed 3480.91 samples/sec Loss 10.4921 LearningRate 0.1213 Epoch: 5 Global Step: 27980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:48,014-Speed 3487.53 samples/sec Loss 10.6599 LearningRate 0.1213 Epoch: 5 Global Step: 27990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:50:50,954-Speed 3482.92 samples/sec Loss 10.5969 LearningRate 0.1212 Epoch: 5 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:51:34,344-[lfw][28000]XNorm: 22.142225 Training: 2022-01-19 19:51:34,344-[lfw][28000]Accuracy-Flip: 0.99667+-0.00325 Training: 2022-01-19 19:51:34,345-[lfw][28000]Accuracy-Highest: 0.99700 Training: 2022-01-19 19:52:24,455-[cfp_fp][28000]XNorm: 19.085154 Training: 2022-01-19 19:52:24,456-[cfp_fp][28000]Accuracy-Flip: 0.95514+-0.00994 Training: 2022-01-19 19:52:24,456-[cfp_fp][28000]Accuracy-Highest: 0.95514 Training: 2022-01-19 19:53:07,570-[agedb_30][28000]XNorm: 21.642880 Training: 2022-01-19 19:53:07,571-[agedb_30][28000]Accuracy-Flip: 0.96817+-0.00621 Training: 2022-01-19 19:53:07,572-[agedb_30][28000]Accuracy-Highest: 0.96983 Training: 2022-01-19 19:53:10,516-Speed 73.37 samples/sec Loss 10.4425 LearningRate 0.1212 Epoch: 5 Global Step: 28010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:13,439-Speed 3504.48 samples/sec Loss 10.5542 LearningRate 0.1212 Epoch: 5 Global Step: 28020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:16,366-Speed 3498.53 samples/sec Loss 10.5109 LearningRate 0.1212 Epoch: 5 Global Step: 28030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:19,287-Speed 3507.04 samples/sec Loss 10.5357 LearningRate 0.1211 Epoch: 5 Global Step: 28040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:22,231-Speed 3479.16 samples/sec Loss 10.3301 LearningRate 0.1211 Epoch: 5 Global Step: 28050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:25,157-Speed 3501.67 samples/sec Loss 10.4223 LearningRate 0.1211 Epoch: 5 Global Step: 28060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:28,125-Speed 3451.28 samples/sec Loss 10.2932 LearningRate 0.1211 Epoch: 5 Global Step: 28070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:31,092-Speed 3451.91 samples/sec Loss 10.3183 LearningRate 0.1210 Epoch: 5 Global Step: 28080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:34,052-Speed 3460.34 samples/sec Loss 10.6011 LearningRate 0.1210 Epoch: 5 Global Step: 28090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:37,023-Speed 3447.65 samples/sec Loss 10.3988 LearningRate 0.1210 Epoch: 5 Global Step: 28100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:39,959-Speed 3489.28 samples/sec Loss 10.3839 LearningRate 0.1210 Epoch: 5 Global Step: 28110 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:53:42,881-Speed 3505.66 samples/sec Loss 10.5031 LearningRate 0.1209 Epoch: 5 Global Step: 28120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:45,821-Speed 3484.01 samples/sec Loss 10.5709 LearningRate 0.1209 Epoch: 5 Global Step: 28130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:48,769-Speed 3473.93 samples/sec Loss 10.4587 LearningRate 0.1209 Epoch: 5 Global Step: 28140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:51,803-Speed 3376.16 samples/sec Loss 10.3965 LearningRate 0.1209 Epoch: 5 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:54,765-Speed 3458.40 samples/sec Loss 10.3693 LearningRate 0.1208 Epoch: 5 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:53:57,711-Speed 3476.74 samples/sec Loss 10.3547 LearningRate 0.1208 Epoch: 5 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:00,662-Speed 3470.71 samples/sec Loss 10.4420 LearningRate 0.1208 Epoch: 5 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:03,657-Speed 3420.57 samples/sec Loss 10.4540 LearningRate 0.1208 Epoch: 5 Global Step: 28190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:06,599-Speed 3481.84 samples/sec Loss 10.5319 LearningRate 0.1207 Epoch: 5 Global Step: 28200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:09,558-Speed 3462.00 samples/sec Loss 10.4125 LearningRate 0.1207 Epoch: 5 Global Step: 28210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:12,491-Speed 3491.37 samples/sec Loss 10.4131 LearningRate 0.1207 Epoch: 5 Global Step: 28220 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:54:15,422-Speed 3495.67 samples/sec Loss 10.4951 LearningRate 0.1207 Epoch: 5 Global Step: 28230 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:54:18,353-Speed 3494.35 samples/sec Loss 10.4561 LearningRate 0.1206 Epoch: 5 Global Step: 28240 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:54:21,289-Speed 3488.53 samples/sec Loss 10.3736 LearningRate 0.1206 Epoch: 5 Global Step: 28250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:24,214-Speed 3501.77 samples/sec Loss 10.6644 LearningRate 0.1206 Epoch: 5 Global Step: 28260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:27,142-Speed 3498.21 samples/sec Loss 10.3944 LearningRate 0.1206 Epoch: 5 Global Step: 28270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:30,069-Speed 3499.73 samples/sec Loss 10.2837 LearningRate 0.1205 Epoch: 5 Global Step: 28280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:33,065-Speed 3418.51 samples/sec Loss 10.3082 LearningRate 0.1205 Epoch: 5 Global Step: 28290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:35,993-Speed 3498.61 samples/sec Loss 10.2961 LearningRate 0.1205 Epoch: 5 Global Step: 28300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:38,953-Speed 3459.97 samples/sec Loss 10.4114 LearningRate 0.1205 Epoch: 5 Global Step: 28310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:41,883-Speed 3496.54 samples/sec Loss 10.3735 LearningRate 0.1204 Epoch: 5 Global Step: 28320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:44,806-Speed 3504.42 samples/sec Loss 10.5318 LearningRate 0.1204 Epoch: 5 Global Step: 28330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:47,735-Speed 3497.51 samples/sec Loss 10.5992 LearningRate 0.1204 Epoch: 5 Global Step: 28340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:50,678-Speed 3479.42 samples/sec Loss 10.5876 LearningRate 0.1204 Epoch: 5 Global Step: 28350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:53,692-Speed 3398.66 samples/sec Loss 10.3271 LearningRate 0.1203 Epoch: 5 Global Step: 28360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:56,690-Speed 3416.49 samples/sec Loss 10.5069 LearningRate 0.1203 Epoch: 5 Global Step: 28370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:54:59,651-Speed 3459.79 samples/sec Loss 10.3399 LearningRate 0.1203 Epoch: 5 Global Step: 28380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:55:02,605-Speed 3466.48 samples/sec Loss 10.6506 LearningRate 0.1203 Epoch: 5 Global Step: 28390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:55:05,532-Speed 3500.23 samples/sec Loss 10.3364 LearningRate 0.1203 Epoch: 5 Global Step: 28400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:55:08,462-Speed 3494.96 samples/sec Loss 10.5035 LearningRate 0.1202 Epoch: 5 Global Step: 28410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:55:11,419-Speed 3463.91 samples/sec Loss 10.2810 LearningRate 0.1202 Epoch: 5 Global Step: 28420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:55:14,364-Speed 3478.88 samples/sec Loss 10.2624 LearningRate 0.1202 Epoch: 5 Global Step: 28430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:55:17,324-Speed 3460.11 samples/sec Loss 10.2839 LearningRate 0.1202 Epoch: 5 Global Step: 28440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:55:20,258-Speed 3491.10 samples/sec Loss 10.3588 LearningRate 0.1201 Epoch: 5 Global Step: 28450 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:55:23,183-Speed 3502.06 samples/sec Loss 10.5644 LearningRate 0.1201 Epoch: 5 Global Step: 28460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:55:26,125-Speed 3481.16 samples/sec Loss 10.5800 LearningRate 0.1201 Epoch: 5 Global Step: 28470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:55:29,042-Speed 3511.29 samples/sec Loss 10.3361 LearningRate 0.1201 Epoch: 5 Global Step: 28480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:55:31,984-Speed 3481.60 samples/sec Loss 10.5833 LearningRate 0.1200 Epoch: 5 Global Step: 28490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:55:34,981-Speed 3417.88 samples/sec Loss 10.3295 LearningRate 0.1200 Epoch: 5 Global Step: 28500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:55:38,062-Speed 3324.57 samples/sec Loss 10.3873 LearningRate 0.1200 Epoch: 5 Global Step: 28510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:55:40,990-Speed 3498.81 samples/sec Loss 10.5565 LearningRate 0.1200 Epoch: 5 Global Step: 28520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:55:43,918-Speed 3498.33 samples/sec Loss 10.4574 LearningRate 0.1199 Epoch: 5 Global Step: 28530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:55:46,841-Speed 3503.65 samples/sec Loss 10.4702 LearningRate 0.1199 Epoch: 5 Global Step: 28540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:55:49,766-Speed 3501.81 samples/sec Loss 10.3624 LearningRate 0.1199 Epoch: 5 Global Step: 28550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:55:52,726-Speed 3460.44 samples/sec Loss 10.3819 LearningRate 0.1199 Epoch: 5 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:55:55,687-Speed 3460.06 samples/sec Loss 10.4894 LearningRate 0.1198 Epoch: 5 Global Step: 28570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:55:58,699-Speed 3399.95 samples/sec Loss 10.3258 LearningRate 0.1198 Epoch: 5 Global Step: 28580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:56:01,657-Speed 3463.00 samples/sec Loss 10.4128 LearningRate 0.1198 Epoch: 5 Global Step: 28590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:56:04,597-Speed 3483.39 samples/sec Loss 10.4514 LearningRate 0.1198 Epoch: 5 Global Step: 28600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:56:07,574-Speed 3441.35 samples/sec Loss 10.3288 LearningRate 0.1197 Epoch: 5 Global Step: 28610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:56:10,533-Speed 3462.59 samples/sec Loss 10.4982 LearningRate 0.1197 Epoch: 5 Global Step: 28620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:56:13,530-Speed 3417.78 samples/sec Loss 10.3771 LearningRate 0.1197 Epoch: 5 Global Step: 28630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:56:16,535-Speed 3407.79 samples/sec Loss 10.2995 LearningRate 0.1197 Epoch: 5 Global Step: 28640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:56:19,480-Speed 3478.22 samples/sec Loss 10.1644 LearningRate 0.1196 Epoch: 5 Global Step: 28650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:56:22,410-Speed 3496.05 samples/sec Loss 10.4630 LearningRate 0.1196 Epoch: 5 Global Step: 28660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:56:25,413-Speed 3411.28 samples/sec Loss 10.5545 LearningRate 0.1196 Epoch: 5 Global Step: 28670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:56:28,341-Speed 3497.22 samples/sec Loss 10.6194 LearningRate 0.1196 Epoch: 5 Global Step: 28680 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 19:56:31,260-Speed 3508.89 samples/sec Loss 10.3718 LearningRate 0.1195 Epoch: 5 Global Step: 28690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:56:34,202-Speed 3482.62 samples/sec Loss 10.2240 LearningRate 0.1195 Epoch: 5 Global Step: 28700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:56:37,135-Speed 3492.49 samples/sec Loss 10.3345 LearningRate 0.1195 Epoch: 5 Global Step: 28710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:56:40,059-Speed 3502.97 samples/sec Loss 10.2692 LearningRate 0.1195 Epoch: 5 Global Step: 28720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:56:42,983-Speed 3502.26 samples/sec Loss 10.3820 LearningRate 0.1194 Epoch: 5 Global Step: 28730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:56:45,932-Speed 3473.81 samples/sec Loss 10.2539 LearningRate 0.1194 Epoch: 5 Global Step: 28740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:56:48,859-Speed 3498.75 samples/sec Loss 10.3191 LearningRate 0.1194 Epoch: 5 Global Step: 28750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:56:51,779-Speed 3508.38 samples/sec Loss 10.3615 LearningRate 0.1194 Epoch: 5 Global Step: 28760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:56:54,720-Speed 3482.55 samples/sec Loss 10.2407 LearningRate 0.1193 Epoch: 5 Global Step: 28770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:56:57,676-Speed 3464.65 samples/sec Loss 10.4270 LearningRate 0.1193 Epoch: 5 Global Step: 28780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:57:00,677-Speed 3413.68 samples/sec Loss 10.1732 LearningRate 0.1193 Epoch: 5 Global Step: 28790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:57:03,628-Speed 3471.49 samples/sec Loss 10.3576 LearningRate 0.1193 Epoch: 5 Global Step: 28800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:57:06,697-Speed 3337.86 samples/sec Loss 10.4136 LearningRate 0.1192 Epoch: 5 Global Step: 28810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:57:09,711-Speed 3398.21 samples/sec Loss 10.3480 LearningRate 0.1192 Epoch: 5 Global Step: 28820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:57:12,645-Speed 3490.98 samples/sec Loss 10.3781 LearningRate 0.1192 Epoch: 5 Global Step: 28830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:57:15,575-Speed 3496.45 samples/sec Loss 10.4346 LearningRate 0.1192 Epoch: 5 Global Step: 28840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:57:18,540-Speed 3454.01 samples/sec Loss 10.2402 LearningRate 0.1191 Epoch: 5 Global Step: 28850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-19 19:57:21,467-Speed 3498.47 samples/sec Loss 10.5234 LearningRate 0.1191 Epoch: 5 Global Step: 28860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:57:24,401-Speed 3491.66 samples/sec Loss 10.3382 LearningRate 0.1191 Epoch: 5 Global Step: 28870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:57:27,363-Speed 3458.05 samples/sec Loss 10.4562 LearningRate 0.1191 Epoch: 5 Global Step: 28880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:57:30,353-Speed 3425.74 samples/sec Loss 10.3989 LearningRate 0.1191 Epoch: 5 Global Step: 28890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:57:33,326-Speed 3445.87 samples/sec Loss 10.4051 LearningRate 0.1190 Epoch: 5 Global Step: 28900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:57:36,380-Speed 3353.95 samples/sec Loss 10.4980 LearningRate 0.1190 Epoch: 5 Global Step: 28910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:57:39,317-Speed 3487.83 samples/sec Loss 10.2883 LearningRate 0.1190 Epoch: 5 Global Step: 28920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:57:42,258-Speed 3482.59 samples/sec Loss 10.4013 LearningRate 0.1190 Epoch: 5 Global Step: 28930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:57:45,190-Speed 3493.09 samples/sec Loss 10.2692 LearningRate 0.1189 Epoch: 5 Global Step: 28940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:57:48,119-Speed 3497.37 samples/sec Loss 10.5288 LearningRate 0.1189 Epoch: 5 Global Step: 28950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:57:51,053-Speed 3490.63 samples/sec Loss 10.4000 LearningRate 0.1189 Epoch: 5 Global Step: 28960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:57:53,979-Speed 3500.87 samples/sec Loss 10.5509 LearningRate 0.1189 Epoch: 5 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:57:56,905-Speed 3500.14 samples/sec Loss 10.2550 LearningRate 0.1188 Epoch: 5 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:57:59,835-Speed 3496.88 samples/sec Loss 10.3131 LearningRate 0.1188 Epoch: 5 Global Step: 28990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:02,763-Speed 3497.90 samples/sec Loss 10.2653 LearningRate 0.1188 Epoch: 5 Global Step: 29000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:05,693-Speed 3496.97 samples/sec Loss 10.2688 LearningRate 0.1188 Epoch: 5 Global Step: 29010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:08,620-Speed 3498.96 samples/sec Loss 10.3915 LearningRate 0.1187 Epoch: 5 Global Step: 29020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:11,546-Speed 3500.51 samples/sec Loss 10.1625 LearningRate 0.1187 Epoch: 5 Global Step: 29030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:14,481-Speed 3496.16 samples/sec Loss 10.3054 LearningRate 0.1187 Epoch: 5 Global Step: 29040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:17,425-Speed 3477.97 samples/sec Loss 10.3058 LearningRate 0.1187 Epoch: 5 Global Step: 29050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:20,354-Speed 3497.81 samples/sec Loss 10.3112 LearningRate 0.1186 Epoch: 5 Global Step: 29060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:23,297-Speed 3480.43 samples/sec Loss 10.3604 LearningRate 0.1186 Epoch: 5 Global Step: 29070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:26,227-Speed 3494.74 samples/sec Loss 10.2710 LearningRate 0.1186 Epoch: 5 Global Step: 29080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:29,160-Speed 3493.42 samples/sec Loss 10.3763 LearningRate 0.1186 Epoch: 5 Global Step: 29090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:32,096-Speed 3488.54 samples/sec Loss 10.3721 LearningRate 0.1185 Epoch: 5 Global Step: 29100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:58:35,081-Speed 3431.70 samples/sec Loss 10.3349 LearningRate 0.1185 Epoch: 5 Global Step: 29110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:58:38,106-Speed 3385.99 samples/sec Loss 10.2747 LearningRate 0.1185 Epoch: 5 Global Step: 29120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:58:41,063-Speed 3463.50 samples/sec Loss 10.3649 LearningRate 0.1185 Epoch: 5 Global Step: 29130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:58:44,015-Speed 3469.60 samples/sec Loss 10.2670 LearningRate 0.1184 Epoch: 5 Global Step: 29140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:58:46,978-Speed 3456.60 samples/sec Loss 10.2078 LearningRate 0.1184 Epoch: 5 Global Step: 29150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:58:49,940-Speed 3457.60 samples/sec Loss 10.1530 LearningRate 0.1184 Epoch: 5 Global Step: 29160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:58:52,970-Speed 3380.52 samples/sec Loss 10.2370 LearningRate 0.1184 Epoch: 5 Global Step: 29170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:58:55,914-Speed 3479.52 samples/sec Loss 10.4722 LearningRate 0.1183 Epoch: 5 Global Step: 29180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:58:58,922-Speed 3405.76 samples/sec Loss 10.2088 LearningRate 0.1183 Epoch: 5 Global Step: 29190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:01,877-Speed 3466.42 samples/sec Loss 10.2395 LearningRate 0.1183 Epoch: 5 Global Step: 29200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:04,822-Speed 3478.05 samples/sec Loss 10.3086 LearningRate 0.1183 Epoch: 5 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:59:07,736-Speed 3514.12 samples/sec Loss 10.4189 LearningRate 0.1182 Epoch: 5 Global Step: 29220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:10,671-Speed 3490.72 samples/sec Loss 10.1782 LearningRate 0.1182 Epoch: 5 Global Step: 29230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:13,603-Speed 3492.39 samples/sec Loss 10.4177 LearningRate 0.1182 Epoch: 5 Global Step: 29240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:16,562-Speed 3461.60 samples/sec Loss 10.0807 LearningRate 0.1182 Epoch: 5 Global Step: 29250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:19,492-Speed 3495.57 samples/sec Loss 10.2367 LearningRate 0.1182 Epoch: 5 Global Step: 29260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:22,503-Speed 3402.52 samples/sec Loss 10.3236 LearningRate 0.1181 Epoch: 5 Global Step: 29270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:25,464-Speed 3459.47 samples/sec Loss 10.3572 LearningRate 0.1181 Epoch: 5 Global Step: 29280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:28,409-Speed 3478.08 samples/sec Loss 10.1701 LearningRate 0.1181 Epoch: 5 Global Step: 29290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:31,374-Speed 3453.81 samples/sec Loss 10.2470 LearningRate 0.1181 Epoch: 5 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:34,308-Speed 3491.98 samples/sec Loss 10.3867 LearningRate 0.1180 Epoch: 5 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 19:59:37,243-Speed 3489.14 samples/sec Loss 10.1918 LearningRate 0.1180 Epoch: 5 Global Step: 29320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:59:40,184-Speed 3482.83 samples/sec Loss 10.1994 LearningRate 0.1180 Epoch: 5 Global Step: 29330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:59:43,208-Speed 3386.86 samples/sec Loss 10.2272 LearningRate 0.1180 Epoch: 5 Global Step: 29340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:59:46,167-Speed 3461.21 samples/sec Loss 10.2197 LearningRate 0.1179 Epoch: 5 Global Step: 29350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:59:49,112-Speed 3478.62 samples/sec Loss 10.2729 LearningRate 0.1179 Epoch: 5 Global Step: 29360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:59:52,044-Speed 3493.58 samples/sec Loss 10.1615 LearningRate 0.1179 Epoch: 5 Global Step: 29370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:59:54,972-Speed 3498.85 samples/sec Loss 10.4898 LearningRate 0.1179 Epoch: 5 Global Step: 29380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 19:59:57,988-Speed 3395.66 samples/sec Loss 10.3428 LearningRate 0.1178 Epoch: 5 Global Step: 29390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:00,931-Speed 3479.93 samples/sec Loss 10.1961 LearningRate 0.1178 Epoch: 5 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:03,876-Speed 3478.19 samples/sec Loss 10.3901 LearningRate 0.1178 Epoch: 5 Global Step: 29410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:06,803-Speed 3499.88 samples/sec Loss 10.3781 LearningRate 0.1178 Epoch: 5 Global Step: 29420 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:00:09,723-Speed 3507.64 samples/sec Loss 10.3589 LearningRate 0.1177 Epoch: 5 Global Step: 29430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:12,667-Speed 3479.95 samples/sec Loss 10.1844 LearningRate 0.1177 Epoch: 5 Global Step: 29440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:15,617-Speed 3471.56 samples/sec Loss 10.2594 LearningRate 0.1177 Epoch: 5 Global Step: 29450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:18,552-Speed 3490.35 samples/sec Loss 10.2196 LearningRate 0.1177 Epoch: 5 Global Step: 29460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:21,489-Speed 3487.35 samples/sec Loss 10.0180 LearningRate 0.1176 Epoch: 5 Global Step: 29470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:24,440-Speed 3471.03 samples/sec Loss 10.3585 LearningRate 0.1176 Epoch: 5 Global Step: 29480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:27,373-Speed 3491.92 samples/sec Loss 10.2094 LearningRate 0.1176 Epoch: 5 Global Step: 29490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:30,312-Speed 3485.92 samples/sec Loss 10.4749 LearningRate 0.1176 Epoch: 5 Global Step: 29500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:33,244-Speed 3493.66 samples/sec Loss 10.2719 LearningRate 0.1175 Epoch: 5 Global Step: 29510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:36,174-Speed 3495.81 samples/sec Loss 10.3029 LearningRate 0.1175 Epoch: 5 Global Step: 29520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:39,103-Speed 3497.15 samples/sec Loss 10.3850 LearningRate 0.1175 Epoch: 5 Global Step: 29530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:42,059-Speed 3463.97 samples/sec Loss 10.4316 LearningRate 0.1175 Epoch: 5 Global Step: 29540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:45,022-Speed 3457.94 samples/sec Loss 10.4175 LearningRate 0.1174 Epoch: 5 Global Step: 29550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:47,975-Speed 3468.52 samples/sec Loss 10.2803 LearningRate 0.1174 Epoch: 5 Global Step: 29560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:50,902-Speed 3500.03 samples/sec Loss 10.0496 LearningRate 0.1174 Epoch: 5 Global Step: 29570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:53,838-Speed 3488.33 samples/sec Loss 10.0597 LearningRate 0.1174 Epoch: 5 Global Step: 29580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:56,769-Speed 3494.68 samples/sec Loss 10.1667 LearningRate 0.1173 Epoch: 5 Global Step: 29590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:00:59,701-Speed 3493.35 samples/sec Loss 10.2083 LearningRate 0.1173 Epoch: 5 Global Step: 29600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:02,631-Speed 3495.66 samples/sec Loss 10.2912 LearningRate 0.1173 Epoch: 5 Global Step: 29610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:05,575-Speed 3479.31 samples/sec Loss 10.2455 LearningRate 0.1173 Epoch: 5 Global Step: 29620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:08,544-Speed 3449.46 samples/sec Loss 10.1132 LearningRate 0.1173 Epoch: 5 Global Step: 29630 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:01:11,544-Speed 3413.81 samples/sec Loss 10.3389 LearningRate 0.1172 Epoch: 5 Global Step: 29640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:14,567-Speed 3388.95 samples/sec Loss 10.3445 LearningRate 0.1172 Epoch: 5 Global Step: 29650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:17,522-Speed 3466.94 samples/sec Loss 10.4424 LearningRate 0.1172 Epoch: 5 Global Step: 29660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:20,458-Speed 3488.08 samples/sec Loss 10.3224 LearningRate 0.1172 Epoch: 5 Global Step: 29670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:23,419-Speed 3459.34 samples/sec Loss 10.3810 LearningRate 0.1171 Epoch: 5 Global Step: 29680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:26,436-Speed 3394.84 samples/sec Loss 10.1387 LearningRate 0.1171 Epoch: 5 Global Step: 29690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:29,396-Speed 3461.15 samples/sec Loss 10.1027 LearningRate 0.1171 Epoch: 5 Global Step: 29700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:32,322-Speed 3500.58 samples/sec Loss 10.2026 LearningRate 0.1171 Epoch: 5 Global Step: 29710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:35,248-Speed 3500.74 samples/sec Loss 10.2944 LearningRate 0.1170 Epoch: 5 Global Step: 29720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:38,191-Speed 3479.55 samples/sec Loss 10.2077 LearningRate 0.1170 Epoch: 5 Global Step: 29730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:41,121-Speed 3497.13 samples/sec Loss 10.2152 LearningRate 0.1170 Epoch: 5 Global Step: 29740 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:01:44,038-Speed 3511.15 samples/sec Loss 10.2992 LearningRate 0.1170 Epoch: 5 Global Step: 29750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:46,981-Speed 3479.99 samples/sec Loss 10.1047 LearningRate 0.1169 Epoch: 5 Global Step: 29760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:49,914-Speed 3492.66 samples/sec Loss 10.3226 LearningRate 0.1169 Epoch: 5 Global Step: 29770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:52,847-Speed 3492.28 samples/sec Loss 10.3990 LearningRate 0.1169 Epoch: 5 Global Step: 29780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:55,856-Speed 3403.85 samples/sec Loss 10.1638 LearningRate 0.1169 Epoch: 5 Global Step: 29790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:01:58,789-Speed 3492.27 samples/sec Loss 10.3344 LearningRate 0.1168 Epoch: 5 Global Step: 29800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:01,751-Speed 3457.29 samples/sec Loss 10.2114 LearningRate 0.1168 Epoch: 5 Global Step: 29810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:04,683-Speed 3493.57 samples/sec Loss 10.1521 LearningRate 0.1168 Epoch: 5 Global Step: 29820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:07,612-Speed 3497.18 samples/sec Loss 10.1599 LearningRate 0.1168 Epoch: 5 Global Step: 29830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:10,546-Speed 3492.37 samples/sec Loss 10.0297 LearningRate 0.1167 Epoch: 5 Global Step: 29840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:13,465-Speed 3509.21 samples/sec Loss 10.3403 LearningRate 0.1167 Epoch: 5 Global Step: 29850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:16,394-Speed 3496.27 samples/sec Loss 10.2262 LearningRate 0.1167 Epoch: 5 Global Step: 29860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:19,321-Speed 3498.88 samples/sec Loss 10.3293 LearningRate 0.1167 Epoch: 5 Global Step: 29870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:22,303-Speed 3435.39 samples/sec Loss 10.2673 LearningRate 0.1166 Epoch: 5 Global Step: 29880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:25,254-Speed 3471.32 samples/sec Loss 10.0905 LearningRate 0.1166 Epoch: 5 Global Step: 29890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:28,221-Speed 3452.24 samples/sec Loss 10.1484 LearningRate 0.1166 Epoch: 5 Global Step: 29900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:31,168-Speed 3475.95 samples/sec Loss 10.1035 LearningRate 0.1166 Epoch: 5 Global Step: 29910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:34,112-Speed 3478.96 samples/sec Loss 10.2097 LearningRate 0.1166 Epoch: 5 Global Step: 29920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:37,051-Speed 3484.90 samples/sec Loss 10.2613 LearningRate 0.1165 Epoch: 5 Global Step: 29930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:39,996-Speed 3478.38 samples/sec Loss 10.3859 LearningRate 0.1165 Epoch: 5 Global Step: 29940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:42,987-Speed 3424.31 samples/sec Loss 10.1083 LearningRate 0.1165 Epoch: 5 Global Step: 29950 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:02:45,934-Speed 3475.53 samples/sec Loss 10.1987 LearningRate 0.1165 Epoch: 5 Global Step: 29960 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:02:48,870-Speed 3488.74 samples/sec Loss 10.2811 LearningRate 0.1164 Epoch: 5 Global Step: 29970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:51,803-Speed 3492.31 samples/sec Loss 10.1737 LearningRate 0.1164 Epoch: 5 Global Step: 29980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:54,740-Speed 3489.08 samples/sec Loss 10.0669 LearningRate 0.1164 Epoch: 5 Global Step: 29990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:02:57,689-Speed 3472.27 samples/sec Loss 10.1445 LearningRate 0.1164 Epoch: 5 Global Step: 30000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:03:40,842-[lfw][30000]XNorm: 21.718190 Training: 2022-01-19 20:03:40,842-[lfw][30000]Accuracy-Flip: 0.99650+-0.00345 Training: 2022-01-19 20:03:40,843-[lfw][30000]Accuracy-Highest: 0.99700 Training: 2022-01-19 20:04:30,803-[cfp_fp][30000]XNorm: 18.946152 Training: 2022-01-19 20:04:30,803-[cfp_fp][30000]Accuracy-Flip: 0.95057+-0.01071 Training: 2022-01-19 20:04:30,804-[cfp_fp][30000]Accuracy-Highest: 0.95514 Training: 2022-01-19 20:05:13,767-[agedb_30][30000]XNorm: 21.405944 Training: 2022-01-19 20:05:13,768-[agedb_30][30000]Accuracy-Flip: 0.96967+-0.00894 Training: 2022-01-19 20:05:13,768-[agedb_30][30000]Accuracy-Highest: 0.96983 Training: 2022-01-19 20:05:16,759-Speed 73.63 samples/sec Loss 10.2165 LearningRate 0.1163 Epoch: 5 Global Step: 30010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:19,708-Speed 3473.19 samples/sec Loss 10.2229 LearningRate 0.1163 Epoch: 5 Global Step: 30020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:22,631-Speed 3504.55 samples/sec Loss 10.3376 LearningRate 0.1163 Epoch: 5 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:25,607-Speed 3442.14 samples/sec Loss 10.2785 LearningRate 0.1163 Epoch: 5 Global Step: 30040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:28,549-Speed 3481.38 samples/sec Loss 10.3596 LearningRate 0.1162 Epoch: 5 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:31,518-Speed 3449.57 samples/sec Loss 10.4463 LearningRate 0.1162 Epoch: 5 Global Step: 30060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:34,435-Speed 3512.27 samples/sec Loss 10.2147 LearningRate 0.1162 Epoch: 5 Global Step: 30070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:37,362-Speed 3498.62 samples/sec Loss 10.3954 LearningRate 0.1162 Epoch: 5 Global Step: 30080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:40,304-Speed 3482.59 samples/sec Loss 10.2307 LearningRate 0.1161 Epoch: 5 Global Step: 30090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:43,233-Speed 3496.21 samples/sec Loss 10.3997 LearningRate 0.1161 Epoch: 5 Global Step: 30100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:46,162-Speed 3498.25 samples/sec Loss 10.2446 LearningRate 0.1161 Epoch: 5 Global Step: 30110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:49,092-Speed 3495.22 samples/sec Loss 10.4152 LearningRate 0.1161 Epoch: 5 Global Step: 30120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:52,016-Speed 3503.10 samples/sec Loss 10.1945 LearningRate 0.1160 Epoch: 5 Global Step: 30130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:55,031-Speed 3397.17 samples/sec Loss 10.2263 LearningRate 0.1160 Epoch: 5 Global Step: 30140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:05:58,000-Speed 3448.93 samples/sec Loss 10.1235 LearningRate 0.1160 Epoch: 5 Global Step: 30150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:00,929-Speed 3497.81 samples/sec Loss 10.0535 LearningRate 0.1160 Epoch: 5 Global Step: 30160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:03,854-Speed 3502.32 samples/sec Loss 10.1776 LearningRate 0.1159 Epoch: 5 Global Step: 30170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:06,778-Speed 3502.99 samples/sec Loss 10.0047 LearningRate 0.1159 Epoch: 5 Global Step: 30180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:09,707-Speed 3496.35 samples/sec Loss 10.1895 LearningRate 0.1159 Epoch: 5 Global Step: 30190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:12,640-Speed 3491.98 samples/sec Loss 10.2227 LearningRate 0.1159 Epoch: 5 Global Step: 30200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:15,568-Speed 3499.83 samples/sec Loss 10.0676 LearningRate 0.1159 Epoch: 5 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:18,497-Speed 3496.03 samples/sec Loss 10.1799 LearningRate 0.1158 Epoch: 5 Global Step: 30220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:21,419-Speed 3506.45 samples/sec Loss 10.0726 LearningRate 0.1158 Epoch: 5 Global Step: 30230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:24,350-Speed 3493.99 samples/sec Loss 10.1343 LearningRate 0.1158 Epoch: 5 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:27,363-Speed 3399.04 samples/sec Loss 10.2079 LearningRate 0.1158 Epoch: 5 Global Step: 30250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:06:30,347-Speed 3434.13 samples/sec Loss 10.2279 LearningRate 0.1157 Epoch: 5 Global Step: 30260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:06:33,312-Speed 3453.64 samples/sec Loss 10.1620 LearningRate 0.1157 Epoch: 5 Global Step: 30270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:06:36,277-Speed 3455.70 samples/sec Loss 10.2316 LearningRate 0.1157 Epoch: 5 Global Step: 30280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:06:39,205-Speed 3497.93 samples/sec Loss 10.1532 LearningRate 0.1157 Epoch: 5 Global Step: 30290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:06:42,119-Speed 3513.95 samples/sec Loss 10.1469 LearningRate 0.1156 Epoch: 5 Global Step: 30300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:45,052-Speed 3492.64 samples/sec Loss 10.2811 LearningRate 0.1156 Epoch: 5 Global Step: 30310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:47,975-Speed 3503.71 samples/sec Loss 10.1306 LearningRate 0.1156 Epoch: 5 Global Step: 30320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:50,911-Speed 3489.25 samples/sec Loss 10.4920 LearningRate 0.1156 Epoch: 5 Global Step: 30330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:06:53,914-Speed 3410.53 samples/sec Loss 10.4801 LearningRate 0.1155 Epoch: 5 Global Step: 30340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:07:07,095-Speed 777.08 samples/sec Loss 10.1484 LearningRate 0.1155 Epoch: 6 Global Step: 30350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:07:10,102-Speed 3406.04 samples/sec Loss 9.4582 LearningRate 0.1155 Epoch: 6 Global Step: 30360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:07:13,314-Speed 3189.57 samples/sec Loss 9.3994 LearningRate 0.1155 Epoch: 6 Global Step: 30370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:07:16,233-Speed 3508.40 samples/sec Loss 9.4262 LearningRate 0.1154 Epoch: 6 Global Step: 30380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:07:19,189-Speed 3464.92 samples/sec Loss 9.3641 LearningRate 0.1154 Epoch: 6 Global Step: 30390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:07:22,137-Speed 3475.01 samples/sec Loss 9.4032 LearningRate 0.1154 Epoch: 6 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:25,077-Speed 3483.24 samples/sec Loss 9.4864 LearningRate 0.1154 Epoch: 6 Global Step: 30410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:28,002-Speed 3502.33 samples/sec Loss 9.4059 LearningRate 0.1153 Epoch: 6 Global Step: 30420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:30,920-Speed 3510.05 samples/sec Loss 9.4302 LearningRate 0.1153 Epoch: 6 Global Step: 30430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:33,985-Speed 3342.56 samples/sec Loss 9.5934 LearningRate 0.1153 Epoch: 6 Global Step: 30440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:36,911-Speed 3500.58 samples/sec Loss 9.5673 LearningRate 0.1153 Epoch: 6 Global Step: 30450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:39,835-Speed 3503.31 samples/sec Loss 9.6594 LearningRate 0.1153 Epoch: 6 Global Step: 30460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:42,773-Speed 3486.29 samples/sec Loss 9.6057 LearningRate 0.1152 Epoch: 6 Global Step: 30470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:45,721-Speed 3474.23 samples/sec Loss 9.6750 LearningRate 0.1152 Epoch: 6 Global Step: 30480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:48,719-Speed 3415.75 samples/sec Loss 9.5803 LearningRate 0.1152 Epoch: 6 Global Step: 30490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:51,777-Speed 3350.27 samples/sec Loss 9.6507 LearningRate 0.1152 Epoch: 6 Global Step: 30500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:54,709-Speed 3492.73 samples/sec Loss 9.7098 LearningRate 0.1151 Epoch: 6 Global Step: 30510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:07:57,664-Speed 3466.08 samples/sec Loss 9.5941 LearningRate 0.1151 Epoch: 6 Global Step: 30520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:00,595-Speed 3500.83 samples/sec Loss 9.7201 LearningRate 0.1151 Epoch: 6 Global Step: 30530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:03,538-Speed 3480.34 samples/sec Loss 9.7157 LearningRate 0.1151 Epoch: 6 Global Step: 30540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:06,470-Speed 3493.54 samples/sec Loss 9.6156 LearningRate 0.1150 Epoch: 6 Global Step: 30550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:09,393-Speed 3504.58 samples/sec Loss 9.7767 LearningRate 0.1150 Epoch: 6 Global Step: 30560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:12,363-Speed 3448.60 samples/sec Loss 9.7719 LearningRate 0.1150 Epoch: 6 Global Step: 30570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:15,307-Speed 3479.04 samples/sec Loss 9.6885 LearningRate 0.1150 Epoch: 6 Global Step: 30580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:18,275-Speed 3450.48 samples/sec Loss 9.7311 LearningRate 0.1149 Epoch: 6 Global Step: 30590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:21,202-Speed 3499.60 samples/sec Loss 9.7098 LearningRate 0.1149 Epoch: 6 Global Step: 30600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:24,173-Speed 3448.48 samples/sec Loss 9.7529 LearningRate 0.1149 Epoch: 6 Global Step: 30610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:27,125-Speed 3469.68 samples/sec Loss 9.6234 LearningRate 0.1149 Epoch: 6 Global Step: 30620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:30,068-Speed 3480.14 samples/sec Loss 9.7800 LearningRate 0.1148 Epoch: 6 Global Step: 30630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:33,019-Speed 3471.49 samples/sec Loss 9.8288 LearningRate 0.1148 Epoch: 6 Global Step: 30640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:35,948-Speed 3496.69 samples/sec Loss 9.7744 LearningRate 0.1148 Epoch: 6 Global Step: 30650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:38,891-Speed 3480.23 samples/sec Loss 9.8301 LearningRate 0.1148 Epoch: 6 Global Step: 30660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:41,826-Speed 3490.19 samples/sec Loss 9.8516 LearningRate 0.1147 Epoch: 6 Global Step: 30670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:44,790-Speed 3455.11 samples/sec Loss 9.6434 LearningRate 0.1147 Epoch: 6 Global Step: 30680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:47,760-Speed 3449.69 samples/sec Loss 9.8232 LearningRate 0.1147 Epoch: 6 Global Step: 30690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:50,721-Speed 3458.67 samples/sec Loss 9.8483 LearningRate 0.1147 Epoch: 6 Global Step: 30700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:53,665-Speed 3478.86 samples/sec Loss 9.6654 LearningRate 0.1147 Epoch: 6 Global Step: 30710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:08:56,598-Speed 3493.33 samples/sec Loss 9.8087 LearningRate 0.1146 Epoch: 6 Global Step: 30720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:08:59,570-Speed 3446.45 samples/sec Loss 9.7455 LearningRate 0.1146 Epoch: 6 Global Step: 30730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:02,499-Speed 3495.76 samples/sec Loss 9.8195 LearningRate 0.1146 Epoch: 6 Global Step: 30740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:05,425-Speed 3501.19 samples/sec Loss 9.6643 LearningRate 0.1146 Epoch: 6 Global Step: 30750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:08,365-Speed 3483.84 samples/sec Loss 9.8463 LearningRate 0.1145 Epoch: 6 Global Step: 30760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:11,304-Speed 3485.17 samples/sec Loss 9.9747 LearningRate 0.1145 Epoch: 6 Global Step: 30770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:14,232-Speed 3498.23 samples/sec Loss 10.0377 LearningRate 0.1145 Epoch: 6 Global Step: 30780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:17,204-Speed 3446.37 samples/sec Loss 9.9602 LearningRate 0.1145 Epoch: 6 Global Step: 30790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:20,141-Speed 3488.33 samples/sec Loss 9.9481 LearningRate 0.1144 Epoch: 6 Global Step: 30800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:23,096-Speed 3465.41 samples/sec Loss 9.9576 LearningRate 0.1144 Epoch: 6 Global Step: 30810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:26,025-Speed 3497.17 samples/sec Loss 9.8701 LearningRate 0.1144 Epoch: 6 Global Step: 30820 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:09:28,953-Speed 3498.15 samples/sec Loss 9.9228 LearningRate 0.1144 Epoch: 6 Global Step: 30830 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:09:31,894-Speed 3483.25 samples/sec Loss 9.9610 LearningRate 0.1143 Epoch: 6 Global Step: 30840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:34,864-Speed 3447.85 samples/sec Loss 9.9664 LearningRate 0.1143 Epoch: 6 Global Step: 30850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:37,804-Speed 3484.26 samples/sec Loss 9.9443 LearningRate 0.1143 Epoch: 6 Global Step: 30860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:40,733-Speed 3498.20 samples/sec Loss 9.9843 LearningRate 0.1143 Epoch: 6 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:43,720-Speed 3428.89 samples/sec Loss 10.0825 LearningRate 0.1142 Epoch: 6 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:46,649-Speed 3497.34 samples/sec Loss 10.1030 LearningRate 0.1142 Epoch: 6 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:49,604-Speed 3465.95 samples/sec Loss 9.8914 LearningRate 0.1142 Epoch: 6 Global Step: 30900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:52,586-Speed 3433.73 samples/sec Loss 9.9025 LearningRate 0.1142 Epoch: 6 Global Step: 30910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:55,547-Speed 3460.34 samples/sec Loss 10.0739 LearningRate 0.1141 Epoch: 6 Global Step: 30920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:09:58,491-Speed 3478.00 samples/sec Loss 9.8145 LearningRate 0.1141 Epoch: 6 Global Step: 30930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:10:01,454-Speed 3457.11 samples/sec Loss 10.0126 LearningRate 0.1141 Epoch: 6 Global Step: 30940 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:10:04,504-Speed 3358.79 samples/sec Loss 10.0125 LearningRate 0.1141 Epoch: 6 Global Step: 30950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:10:07,565-Speed 3345.54 samples/sec Loss 10.0049 LearningRate 0.1141 Epoch: 6 Global Step: 30960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:10,515-Speed 3472.96 samples/sec Loss 9.8985 LearningRate 0.1140 Epoch: 6 Global Step: 30970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:13,490-Speed 3443.06 samples/sec Loss 9.9727 LearningRate 0.1140 Epoch: 6 Global Step: 30980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:16,437-Speed 3475.27 samples/sec Loss 9.8970 LearningRate 0.1140 Epoch: 6 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:19,375-Speed 3487.15 samples/sec Loss 9.8622 LearningRate 0.1140 Epoch: 6 Global Step: 31000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:22,319-Speed 3478.45 samples/sec Loss 9.9378 LearningRate 0.1139 Epoch: 6 Global Step: 31010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:25,273-Speed 3468.20 samples/sec Loss 10.0777 LearningRate 0.1139 Epoch: 6 Global Step: 31020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:28,208-Speed 3489.16 samples/sec Loss 9.9343 LearningRate 0.1139 Epoch: 6 Global Step: 31030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:31,154-Speed 3477.47 samples/sec Loss 9.9303 LearningRate 0.1139 Epoch: 6 Global Step: 31040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:34,164-Speed 3402.46 samples/sec Loss 9.8765 LearningRate 0.1138 Epoch: 6 Global Step: 31050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:37,128-Speed 3456.42 samples/sec Loss 9.9592 LearningRate 0.1138 Epoch: 6 Global Step: 31060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:10:40,058-Speed 3496.29 samples/sec Loss 10.0824 LearningRate 0.1138 Epoch: 6 Global Step: 31070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:42,998-Speed 3483.59 samples/sec Loss 9.9997 LearningRate 0.1138 Epoch: 6 Global Step: 31080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:45,945-Speed 3475.56 samples/sec Loss 9.9689 LearningRate 0.1137 Epoch: 6 Global Step: 31090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:48,872-Speed 3500.00 samples/sec Loss 9.9538 LearningRate 0.1137 Epoch: 6 Global Step: 31100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:51,825-Speed 3468.07 samples/sec Loss 9.9304 LearningRate 0.1137 Epoch: 6 Global Step: 31110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:54,807-Speed 3434.51 samples/sec Loss 9.9595 LearningRate 0.1137 Epoch: 6 Global Step: 31120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:10:57,745-Speed 3486.38 samples/sec Loss 10.1207 LearningRate 0.1136 Epoch: 6 Global Step: 31130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:11:00,681-Speed 3489.34 samples/sec Loss 10.0074 LearningRate 0.1136 Epoch: 6 Global Step: 31140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:11:03,610-Speed 3497.83 samples/sec Loss 9.9140 LearningRate 0.1136 Epoch: 6 Global Step: 31150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:11:06,538-Speed 3498.61 samples/sec Loss 10.0832 LearningRate 0.1136 Epoch: 6 Global Step: 31160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:11:09,523-Speed 3430.77 samples/sec Loss 10.1766 LearningRate 0.1136 Epoch: 6 Global Step: 31170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:12,488-Speed 3454.23 samples/sec Loss 9.9989 LearningRate 0.1135 Epoch: 6 Global Step: 31180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:15,476-Speed 3430.17 samples/sec Loss 10.1075 LearningRate 0.1135 Epoch: 6 Global Step: 31190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:18,408-Speed 3492.76 samples/sec Loss 9.9620 LearningRate 0.1135 Epoch: 6 Global Step: 31200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:21,381-Speed 3445.79 samples/sec Loss 10.0052 LearningRate 0.1135 Epoch: 6 Global Step: 31210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:24,335-Speed 3467.36 samples/sec Loss 10.1554 LearningRate 0.1134 Epoch: 6 Global Step: 31220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:27,327-Speed 3423.47 samples/sec Loss 10.0216 LearningRate 0.1134 Epoch: 6 Global Step: 31230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:30,255-Speed 3498.62 samples/sec Loss 9.9921 LearningRate 0.1134 Epoch: 6 Global Step: 31240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:33,185-Speed 3496.26 samples/sec Loss 10.0519 LearningRate 0.1134 Epoch: 6 Global Step: 31250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:36,120-Speed 3490.57 samples/sec Loss 10.1055 LearningRate 0.1133 Epoch: 6 Global Step: 31260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:39,061-Speed 3481.56 samples/sec Loss 9.9931 LearningRate 0.1133 Epoch: 6 Global Step: 31270 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:11:42,046-Speed 3432.17 samples/sec Loss 9.8663 LearningRate 0.1133 Epoch: 6 Global Step: 31280 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:11:44,957-Speed 3518.96 samples/sec Loss 9.9559 LearningRate 0.1133 Epoch: 6 Global Step: 31290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:47,903-Speed 3475.91 samples/sec Loss 10.0239 LearningRate 0.1132 Epoch: 6 Global Step: 31300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:50,836-Speed 3492.73 samples/sec Loss 9.9404 LearningRate 0.1132 Epoch: 6 Global Step: 31310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:11:53,759-Speed 3504.14 samples/sec Loss 9.9158 LearningRate 0.1132 Epoch: 6 Global Step: 31320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:11:56,709-Speed 3473.01 samples/sec Loss 10.1584 LearningRate 0.1132 Epoch: 6 Global Step: 31330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:11:59,640-Speed 3494.98 samples/sec Loss 10.0461 LearningRate 0.1131 Epoch: 6 Global Step: 31340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:12:02,568-Speed 3498.20 samples/sec Loss 10.0905 LearningRate 0.1131 Epoch: 6 Global Step: 31350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:12:05,515-Speed 3474.93 samples/sec Loss 10.0429 LearningRate 0.1131 Epoch: 6 Global Step: 31360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:12:08,447-Speed 3494.25 samples/sec Loss 9.9680 LearningRate 0.1131 Epoch: 6 Global Step: 31370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:12:11,380-Speed 3491.63 samples/sec Loss 9.9746 LearningRate 0.1131 Epoch: 6 Global Step: 31380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:12:14,334-Speed 3468.01 samples/sec Loss 9.9930 LearningRate 0.1130 Epoch: 6 Global Step: 31390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:12:17,272-Speed 3485.75 samples/sec Loss 9.9213 LearningRate 0.1130 Epoch: 6 Global Step: 31400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:12:20,204-Speed 3493.14 samples/sec Loss 9.7861 LearningRate 0.1130 Epoch: 6 Global Step: 31410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:12:23,196-Speed 3423.25 samples/sec Loss 10.1059 LearningRate 0.1130 Epoch: 6 Global Step: 31420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:26,135-Speed 3486.19 samples/sec Loss 9.9525 LearningRate 0.1129 Epoch: 6 Global Step: 31430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:29,066-Speed 3494.34 samples/sec Loss 9.8465 LearningRate 0.1129 Epoch: 6 Global Step: 31440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:31,998-Speed 3494.02 samples/sec Loss 9.8769 LearningRate 0.1129 Epoch: 6 Global Step: 31450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:34,946-Speed 3473.74 samples/sec Loss 10.0225 LearningRate 0.1129 Epoch: 6 Global Step: 31460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:37,897-Speed 3470.91 samples/sec Loss 9.9908 LearningRate 0.1128 Epoch: 6 Global Step: 31470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:40,908-Speed 3401.90 samples/sec Loss 9.8364 LearningRate 0.1128 Epoch: 6 Global Step: 31480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:43,848-Speed 3483.91 samples/sec Loss 10.0377 LearningRate 0.1128 Epoch: 6 Global Step: 31490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:46,800-Speed 3470.02 samples/sec Loss 9.6832 LearningRate 0.1128 Epoch: 6 Global Step: 31500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:49,781-Speed 3436.03 samples/sec Loss 10.1807 LearningRate 0.1127 Epoch: 6 Global Step: 31510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:52,704-Speed 3504.11 samples/sec Loss 9.9540 LearningRate 0.1127 Epoch: 6 Global Step: 31520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:55,636-Speed 3493.70 samples/sec Loss 10.0684 LearningRate 0.1127 Epoch: 6 Global Step: 31530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:12:58,568-Speed 3493.32 samples/sec Loss 9.9607 LearningRate 0.1127 Epoch: 6 Global Step: 31540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:01,511-Speed 3481.11 samples/sec Loss 9.9537 LearningRate 0.1126 Epoch: 6 Global Step: 31550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:04,485-Speed 3444.04 samples/sec Loss 10.0423 LearningRate 0.1126 Epoch: 6 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:07,452-Speed 3451.73 samples/sec Loss 10.0258 LearningRate 0.1126 Epoch: 6 Global Step: 31570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:10,459-Speed 3406.49 samples/sec Loss 10.1344 LearningRate 0.1126 Epoch: 6 Global Step: 31580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:13,466-Speed 3405.81 samples/sec Loss 10.0622 LearningRate 0.1126 Epoch: 6 Global Step: 31590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:16,438-Speed 3446.70 samples/sec Loss 9.8484 LearningRate 0.1125 Epoch: 6 Global Step: 31600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:19,405-Speed 3451.68 samples/sec Loss 9.8738 LearningRate 0.1125 Epoch: 6 Global Step: 31610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:22,338-Speed 3493.00 samples/sec Loss 9.8256 LearningRate 0.1125 Epoch: 6 Global Step: 31620 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:13:25,261-Speed 3504.45 samples/sec Loss 9.9607 LearningRate 0.1125 Epoch: 6 Global Step: 31630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:28,196-Speed 3489.37 samples/sec Loss 9.9951 LearningRate 0.1124 Epoch: 6 Global Step: 31640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:31,128-Speed 3494.20 samples/sec Loss 9.9049 LearningRate 0.1124 Epoch: 6 Global Step: 31650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:34,102-Speed 3444.00 samples/sec Loss 9.8808 LearningRate 0.1124 Epoch: 6 Global Step: 31660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:37,062-Speed 3460.37 samples/sec Loss 9.9920 LearningRate 0.1124 Epoch: 6 Global Step: 31670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:40,012-Speed 3471.64 samples/sec Loss 9.8358 LearningRate 0.1123 Epoch: 6 Global Step: 31680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:42,988-Speed 3442.13 samples/sec Loss 9.8247 LearningRate 0.1123 Epoch: 6 Global Step: 31690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:45,920-Speed 3493.36 samples/sec Loss 9.8094 LearningRate 0.1123 Epoch: 6 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:48,878-Speed 3462.46 samples/sec Loss 9.8203 LearningRate 0.1123 Epoch: 6 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:51,808-Speed 3496.33 samples/sec Loss 9.9636 LearningRate 0.1122 Epoch: 6 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:13:54,742-Speed 3491.71 samples/sec Loss 9.9082 LearningRate 0.1122 Epoch: 6 Global Step: 31730 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:13:57,677-Speed 3489.28 samples/sec Loss 9.9693 LearningRate 0.1122 Epoch: 6 Global Step: 31740 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:14:00,598-Speed 3506.62 samples/sec Loss 9.8711 LearningRate 0.1122 Epoch: 6 Global Step: 31750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:14:03,533-Speed 3489.61 samples/sec Loss 10.0148 LearningRate 0.1122 Epoch: 6 Global Step: 31760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:14:06,488-Speed 3466.60 samples/sec Loss 9.9553 LearningRate 0.1121 Epoch: 6 Global Step: 31770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:14:09,428-Speed 3484.01 samples/sec Loss 9.9372 LearningRate 0.1121 Epoch: 6 Global Step: 31780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:14:12,441-Speed 3399.71 samples/sec Loss 10.0952 LearningRate 0.1121 Epoch: 6 Global Step: 31790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:14:15,412-Speed 3447.11 samples/sec Loss 9.8165 LearningRate 0.1121 Epoch: 6 Global Step: 31800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:14:18,353-Speed 3482.64 samples/sec Loss 10.1350 LearningRate 0.1120 Epoch: 6 Global Step: 31810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:14:21,286-Speed 3493.31 samples/sec Loss 10.0344 LearningRate 0.1120 Epoch: 6 Global Step: 31820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:14:24,204-Speed 3509.02 samples/sec Loss 9.9775 LearningRate 0.1120 Epoch: 6 Global Step: 31830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:14:27,140-Speed 3488.94 samples/sec Loss 10.0097 LearningRate 0.1120 Epoch: 6 Global Step: 31840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:14:30,081-Speed 3483.23 samples/sec Loss 9.8911 LearningRate 0.1119 Epoch: 6 Global Step: 31850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-19 20:14:33,031-Speed 3471.97 samples/sec Loss 9.9160 LearningRate 0.1119 Epoch: 6 Global Step: 31860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:14:36,062-Speed 3379.64 samples/sec Loss 10.0822 LearningRate 0.1119 Epoch: 6 Global Step: 31870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:14:38,993-Speed 3493.33 samples/sec Loss 10.0809 LearningRate 0.1119 Epoch: 6 Global Step: 31880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:14:41,934-Speed 3483.04 samples/sec Loss 9.8605 LearningRate 0.1118 Epoch: 6 Global Step: 31890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:14:44,868-Speed 3491.99 samples/sec Loss 9.8307 LearningRate 0.1118 Epoch: 6 Global Step: 31900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:14:47,798-Speed 3494.89 samples/sec Loss 9.9518 LearningRate 0.1118 Epoch: 6 Global Step: 31910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:14:50,744-Speed 3477.18 samples/sec Loss 10.1463 LearningRate 0.1118 Epoch: 6 Global Step: 31920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:14:53,738-Speed 3421.61 samples/sec Loss 10.0017 LearningRate 0.1117 Epoch: 6 Global Step: 31930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:14:56,706-Speed 3451.36 samples/sec Loss 9.9825 LearningRate 0.1117 Epoch: 6 Global Step: 31940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:14:59,674-Speed 3450.97 samples/sec Loss 10.1146 LearningRate 0.1117 Epoch: 6 Global Step: 31950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:15:02,610-Speed 3488.10 samples/sec Loss 9.8284 LearningRate 0.1117 Epoch: 6 Global Step: 31960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:15:05,554-Speed 3479.43 samples/sec Loss 10.0079 LearningRate 0.1117 Epoch: 6 Global Step: 31970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:15:08,497-Speed 3480.54 samples/sec Loss 9.9399 LearningRate 0.1116 Epoch: 6 Global Step: 31980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:15:11,428-Speed 3494.82 samples/sec Loss 9.8669 LearningRate 0.1116 Epoch: 6 Global Step: 31990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:15:14,365-Speed 3487.61 samples/sec Loss 9.8324 LearningRate 0.1116 Epoch: 6 Global Step: 32000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:15:57,256-[lfw][32000]XNorm: 22.476537 Training: 2022-01-19 20:15:57,257-[lfw][32000]Accuracy-Flip: 0.99667+-0.00316 Training: 2022-01-19 20:15:57,257-[lfw][32000]Accuracy-Highest: 0.99700 Training: 2022-01-19 20:16:47,047-[cfp_fp][32000]XNorm: 19.710940 Training: 2022-01-19 20:16:47,047-[cfp_fp][32000]Accuracy-Flip: 0.95343+-0.00748 Training: 2022-01-19 20:16:47,048-[cfp_fp][32000]Accuracy-Highest: 0.95514 Training: 2022-01-19 20:17:29,964-[agedb_30][32000]XNorm: 22.207484 Training: 2022-01-19 20:17:29,965-[agedb_30][32000]Accuracy-Flip: 0.96983+-0.01031 Training: 2022-01-19 20:17:29,966-[agedb_30][32000]Accuracy-Highest: 0.96983 Training: 2022-01-19 20:17:32,897-Speed 73.92 samples/sec Loss 9.9244 LearningRate 0.1116 Epoch: 6 Global Step: 32010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:17:35,819-Speed 3505.15 samples/sec Loss 10.0109 LearningRate 0.1115 Epoch: 6 Global Step: 32020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:17:38,730-Speed 3518.92 samples/sec Loss 9.9339 LearningRate 0.1115 Epoch: 6 Global Step: 32030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:17:41,673-Speed 3480.59 samples/sec Loss 9.8743 LearningRate 0.1115 Epoch: 6 Global Step: 32040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:17:44,591-Speed 3509.97 samples/sec Loss 9.9320 LearningRate 0.1115 Epoch: 6 Global Step: 32050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:17:47,549-Speed 3464.02 samples/sec Loss 9.9993 LearningRate 0.1114 Epoch: 6 Global Step: 32060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:17:50,530-Speed 3435.51 samples/sec Loss 10.2782 LearningRate 0.1114 Epoch: 6 Global Step: 32070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:17:53,538-Speed 3405.22 samples/sec Loss 9.9228 LearningRate 0.1114 Epoch: 6 Global Step: 32080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:17:56,463-Speed 3502.87 samples/sec Loss 9.8982 LearningRate 0.1114 Epoch: 6 Global Step: 32090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:17:59,391-Speed 3497.84 samples/sec Loss 9.9310 LearningRate 0.1113 Epoch: 6 Global Step: 32100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:02,387-Speed 3420.41 samples/sec Loss 9.9229 LearningRate 0.1113 Epoch: 6 Global Step: 32110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:05,335-Speed 3473.94 samples/sec Loss 9.9208 LearningRate 0.1113 Epoch: 6 Global Step: 32120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:08,276-Speed 3482.47 samples/sec Loss 9.8519 LearningRate 0.1113 Epoch: 6 Global Step: 32130 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:18:11,210-Speed 3491.73 samples/sec Loss 10.0771 LearningRate 0.1113 Epoch: 6 Global Step: 32140 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:18:14,147-Speed 3487.82 samples/sec Loss 9.9638 LearningRate 0.1112 Epoch: 6 Global Step: 32150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:17,073-Speed 3500.35 samples/sec Loss 9.7905 LearningRate 0.1112 Epoch: 6 Global Step: 32160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:20,008-Speed 3490.50 samples/sec Loss 10.0004 LearningRate 0.1112 Epoch: 6 Global Step: 32170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:22,966-Speed 3463.21 samples/sec Loss 9.8612 LearningRate 0.1112 Epoch: 6 Global Step: 32180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:25,892-Speed 3500.41 samples/sec Loss 10.0962 LearningRate 0.1111 Epoch: 6 Global Step: 32190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:28,849-Speed 3464.26 samples/sec Loss 10.1039 LearningRate 0.1111 Epoch: 6 Global Step: 32200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:31,781-Speed 3494.40 samples/sec Loss 9.8936 LearningRate 0.1111 Epoch: 6 Global Step: 32210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:34,748-Speed 3452.48 samples/sec Loss 9.8522 LearningRate 0.1111 Epoch: 6 Global Step: 32220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:37,683-Speed 3489.79 samples/sec Loss 9.9385 LearningRate 0.1110 Epoch: 6 Global Step: 32230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:40,645-Speed 3457.56 samples/sec Loss 9.9642 LearningRate 0.1110 Epoch: 6 Global Step: 32240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:43,639-Speed 3421.88 samples/sec Loss 10.0153 LearningRate 0.1110 Epoch: 6 Global Step: 32250 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:18:46,552-Speed 3515.53 samples/sec Loss 9.9456 LearningRate 0.1110 Epoch: 6 Global Step: 32260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:49,478-Speed 3500.95 samples/sec Loss 9.6934 LearningRate 0.1109 Epoch: 6 Global Step: 32270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:52,490-Speed 3400.43 samples/sec Loss 9.8861 LearningRate 0.1109 Epoch: 6 Global Step: 32280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:55,428-Speed 3486.61 samples/sec Loss 9.9418 LearningRate 0.1109 Epoch: 6 Global Step: 32290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:18:58,409-Speed 3435.40 samples/sec Loss 9.7618 LearningRate 0.1109 Epoch: 6 Global Step: 32300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:01,418-Speed 3404.12 samples/sec Loss 9.9805 LearningRate 0.1109 Epoch: 6 Global Step: 32310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:04,381-Speed 3457.96 samples/sec Loss 10.0399 LearningRate 0.1108 Epoch: 6 Global Step: 32320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:07,340-Speed 3462.27 samples/sec Loss 9.9462 LearningRate 0.1108 Epoch: 6 Global Step: 32330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:10,268-Speed 3497.47 samples/sec Loss 9.9019 LearningRate 0.1108 Epoch: 6 Global Step: 32340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:13,200-Speed 3493.73 samples/sec Loss 9.9460 LearningRate 0.1108 Epoch: 6 Global Step: 32350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:16,224-Speed 3386.86 samples/sec Loss 9.9132 LearningRate 0.1107 Epoch: 6 Global Step: 32360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:19,210-Speed 3430.72 samples/sec Loss 10.0040 LearningRate 0.1107 Epoch: 6 Global Step: 32370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:22,147-Speed 3488.73 samples/sec Loss 10.0218 LearningRate 0.1107 Epoch: 6 Global Step: 32380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:25,083-Speed 3488.42 samples/sec Loss 9.8829 LearningRate 0.1107 Epoch: 6 Global Step: 32390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:28,050-Speed 3451.73 samples/sec Loss 10.0786 LearningRate 0.1106 Epoch: 6 Global Step: 32400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:31,016-Speed 3454.09 samples/sec Loss 9.8971 LearningRate 0.1106 Epoch: 6 Global Step: 32410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:33,950-Speed 3491.47 samples/sec Loss 9.8683 LearningRate 0.1106 Epoch: 6 Global Step: 32420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:36,870-Speed 3507.22 samples/sec Loss 10.1550 LearningRate 0.1106 Epoch: 6 Global Step: 32430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:39,796-Speed 3501.05 samples/sec Loss 9.8075 LearningRate 0.1105 Epoch: 6 Global Step: 32440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:42,731-Speed 3490.54 samples/sec Loss 10.0022 LearningRate 0.1105 Epoch: 6 Global Step: 32450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:45,718-Speed 3427.78 samples/sec Loss 9.9801 LearningRate 0.1105 Epoch: 6 Global Step: 32460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:48,640-Speed 3506.45 samples/sec Loss 10.0431 LearningRate 0.1105 Epoch: 6 Global Step: 32470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:51,577-Speed 3487.13 samples/sec Loss 9.8752 LearningRate 0.1105 Epoch: 6 Global Step: 32480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:54,499-Speed 3504.86 samples/sec Loss 10.0811 LearningRate 0.1104 Epoch: 6 Global Step: 32490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:19:57,494-Speed 3420.29 samples/sec Loss 9.8831 LearningRate 0.1104 Epoch: 6 Global Step: 32500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:20:00,519-Speed 3386.95 samples/sec Loss 9.9114 LearningRate 0.1104 Epoch: 6 Global Step: 32510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:20:03,445-Speed 3500.39 samples/sec Loss 9.8508 LearningRate 0.1104 Epoch: 6 Global Step: 32520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:20:06,363-Speed 3510.37 samples/sec Loss 9.9583 LearningRate 0.1103 Epoch: 6 Global Step: 32530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:20:09,342-Speed 3437.82 samples/sec Loss 9.9794 LearningRate 0.1103 Epoch: 6 Global Step: 32540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:20:12,281-Speed 3485.58 samples/sec Loss 9.8293 LearningRate 0.1103 Epoch: 6 Global Step: 32550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-19 20:20:15,218-Speed 3487.34 samples/sec Loss 9.8431 LearningRate 0.1103 Epoch: 6 Global Step: 32560 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-19 20:20:18,132-Speed 3515.46 samples/sec Loss 9.7809 LearningRate 0.1102 Epoch: 6 Global Step: 32570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:20:21,061-Speed 3496.77 samples/sec Loss 9.9738 LearningRate 0.1102 Epoch: 6 Global Step: 32580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:20:24,007-Speed 3476.30 samples/sec Loss 9.7485 LearningRate 0.1102 Epoch: 6 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:20:26,966-Speed 3462.63 samples/sec Loss 9.7893 LearningRate 0.1102 Epoch: 6 Global Step: 32600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:20:29,945-Speed 3438.86 samples/sec Loss 9.8923 LearningRate 0.1101 Epoch: 6 Global Step: 32610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:20:33,033-Speed 3316.85 samples/sec Loss 9.8594 LearningRate 0.1101 Epoch: 6 Global Step: 32620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:20:35,969-Speed 3488.46 samples/sec Loss 9.6788 LearningRate 0.1101 Epoch: 6 Global Step: 32630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:20:38,897-Speed 3497.78 samples/sec Loss 9.8387 LearningRate 0.1101 Epoch: 6 Global Step: 32640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:20:41,851-Speed 3467.83 samples/sec Loss 9.8874 LearningRate 0.1101 Epoch: 6 Global Step: 32650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:20:44,782-Speed 3494.60 samples/sec Loss 9.9746 LearningRate 0.1100 Epoch: 6 Global Step: 32660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:20:47,723-Speed 3483.29 samples/sec Loss 9.9031 LearningRate 0.1100 Epoch: 6 Global Step: 32670 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:20:50,644-Speed 3506.00 samples/sec Loss 9.6963 LearningRate 0.1100 Epoch: 6 Global Step: 32680 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:20:53,571-Speed 3499.85 samples/sec Loss 10.0217 LearningRate 0.1100 Epoch: 6 Global Step: 32690 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:20:56,481-Speed 3519.80 samples/sec Loss 9.8592 LearningRate 0.1099 Epoch: 6 Global Step: 32700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:20:59,482-Speed 3413.57 samples/sec Loss 9.9877 LearningRate 0.1099 Epoch: 6 Global Step: 32710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:02,480-Speed 3416.12 samples/sec Loss 9.8785 LearningRate 0.1099 Epoch: 6 Global Step: 32720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:05,424-Speed 3479.71 samples/sec Loss 9.8827 LearningRate 0.1099 Epoch: 6 Global Step: 32730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:08,381-Speed 3463.04 samples/sec Loss 9.7601 LearningRate 0.1098 Epoch: 6 Global Step: 32740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:11,319-Speed 3487.17 samples/sec Loss 9.8739 LearningRate 0.1098 Epoch: 6 Global Step: 32750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:14,244-Speed 3500.90 samples/sec Loss 9.8242 LearningRate 0.1098 Epoch: 6 Global Step: 32760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:17,181-Speed 3488.32 samples/sec Loss 9.9089 LearningRate 0.1098 Epoch: 6 Global Step: 32770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:20,384-Speed 3197.50 samples/sec Loss 10.1046 LearningRate 0.1097 Epoch: 6 Global Step: 32780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:23,327-Speed 3480.41 samples/sec Loss 9.7777 LearningRate 0.1097 Epoch: 6 Global Step: 32790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:26,252-Speed 3501.04 samples/sec Loss 9.9873 LearningRate 0.1097 Epoch: 6 Global Step: 32800 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:21:29,288-Speed 3374.00 samples/sec Loss 9.8816 LearningRate 0.1097 Epoch: 6 Global Step: 32810 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:21:32,237-Speed 3473.26 samples/sec Loss 9.8568 LearningRate 0.1097 Epoch: 6 Global Step: 32820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:35,175-Speed 3486.93 samples/sec Loss 9.9714 LearningRate 0.1096 Epoch: 6 Global Step: 32830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:38,105-Speed 3495.91 samples/sec Loss 9.8401 LearningRate 0.1096 Epoch: 6 Global Step: 32840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:41,074-Speed 3449.23 samples/sec Loss 9.8090 LearningRate 0.1096 Epoch: 6 Global Step: 32850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:44,008-Speed 3491.43 samples/sec Loss 9.8329 LearningRate 0.1096 Epoch: 6 Global Step: 32860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:46,945-Speed 3487.54 samples/sec Loss 9.9933 LearningRate 0.1095 Epoch: 6 Global Step: 32870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:49,946-Speed 3413.66 samples/sec Loss 10.0659 LearningRate 0.1095 Epoch: 6 Global Step: 32880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:52,905-Speed 3460.92 samples/sec Loss 9.9313 LearningRate 0.1095 Epoch: 6 Global Step: 32890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:55,907-Speed 3411.93 samples/sec Loss 10.0521 LearningRate 0.1095 Epoch: 6 Global Step: 32900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:21:58,837-Speed 3496.22 samples/sec Loss 9.7223 LearningRate 0.1094 Epoch: 6 Global Step: 32910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:01,761-Speed 3503.63 samples/sec Loss 9.9546 LearningRate 0.1094 Epoch: 6 Global Step: 32920 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:22:04,699-Speed 3486.08 samples/sec Loss 9.8730 LearningRate 0.1094 Epoch: 6 Global Step: 32930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:07,637-Speed 3486.78 samples/sec Loss 9.7722 LearningRate 0.1094 Epoch: 6 Global Step: 32940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:10,607-Speed 3448.11 samples/sec Loss 9.8778 LearningRate 0.1093 Epoch: 6 Global Step: 32950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:13,533-Speed 3501.06 samples/sec Loss 10.0738 LearningRate 0.1093 Epoch: 6 Global Step: 32960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:16,466-Speed 3492.55 samples/sec Loss 9.7918 LearningRate 0.1093 Epoch: 6 Global Step: 32970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:19,401-Speed 3489.36 samples/sec Loss 9.7839 LearningRate 0.1093 Epoch: 6 Global Step: 32980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:22,360-Speed 3462.18 samples/sec Loss 9.8329 LearningRate 0.1093 Epoch: 6 Global Step: 32990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:25,426-Speed 3340.71 samples/sec Loss 9.9872 LearningRate 0.1092 Epoch: 6 Global Step: 33000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:28,406-Speed 3437.51 samples/sec Loss 9.7740 LearningRate 0.1092 Epoch: 6 Global Step: 33010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:31,348-Speed 3480.82 samples/sec Loss 9.8327 LearningRate 0.1092 Epoch: 6 Global Step: 33020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:34,275-Speed 3499.98 samples/sec Loss 9.6446 LearningRate 0.1092 Epoch: 6 Global Step: 33030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:37,211-Speed 3489.03 samples/sec Loss 9.9137 LearningRate 0.1091 Epoch: 6 Global Step: 33040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:40,148-Speed 3486.43 samples/sec Loss 9.8146 LearningRate 0.1091 Epoch: 6 Global Step: 33050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:43,104-Speed 3465.33 samples/sec Loss 9.8851 LearningRate 0.1091 Epoch: 6 Global Step: 33060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:46,045-Speed 3482.99 samples/sec Loss 9.9011 LearningRate 0.1091 Epoch: 6 Global Step: 33070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:48,992-Speed 3475.11 samples/sec Loss 9.7434 LearningRate 0.1090 Epoch: 6 Global Step: 33080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:51,962-Speed 3450.07 samples/sec Loss 9.7989 LearningRate 0.1090 Epoch: 6 Global Step: 33090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:54,953-Speed 3423.92 samples/sec Loss 9.9669 LearningRate 0.1090 Epoch: 6 Global Step: 33100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:22:57,880-Speed 3499.84 samples/sec Loss 9.9458 LearningRate 0.1090 Epoch: 6 Global Step: 33110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:23:00,812-Speed 3492.75 samples/sec Loss 9.8836 LearningRate 0.1090 Epoch: 6 Global Step: 33120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:23:03,726-Speed 3515.21 samples/sec Loss 9.9058 LearningRate 0.1089 Epoch: 6 Global Step: 33130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:23:06,668-Speed 3481.46 samples/sec Loss 9.9412 LearningRate 0.1089 Epoch: 6 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:23:09,594-Speed 3501.44 samples/sec Loss 9.8527 LearningRate 0.1089 Epoch: 6 Global Step: 33150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:23:12,577-Speed 3433.63 samples/sec Loss 9.9036 LearningRate 0.1089 Epoch: 6 Global Step: 33160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:23:15,521-Speed 3478.98 samples/sec Loss 9.9820 LearningRate 0.1088 Epoch: 6 Global Step: 33170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:23:18,455-Speed 3491.43 samples/sec Loss 9.8736 LearningRate 0.1088 Epoch: 6 Global Step: 33180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:23:21,442-Speed 3429.37 samples/sec Loss 9.8765 LearningRate 0.1088 Epoch: 6 Global Step: 33190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:23:24,386-Speed 3478.46 samples/sec Loss 9.7981 LearningRate 0.1088 Epoch: 6 Global Step: 33200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:23:27,316-Speed 3496.68 samples/sec Loss 9.9629 LearningRate 0.1087 Epoch: 6 Global Step: 33210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:23:30,241-Speed 3501.69 samples/sec Loss 10.0256 LearningRate 0.1087 Epoch: 6 Global Step: 33220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:23:33,168-Speed 3499.06 samples/sec Loss 9.8482 LearningRate 0.1087 Epoch: 6 Global Step: 33230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:23:36,092-Speed 3502.43 samples/sec Loss 9.6091 LearningRate 0.1087 Epoch: 6 Global Step: 33240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:23:39,031-Speed 3485.24 samples/sec Loss 9.8084 LearningRate 0.1086 Epoch: 6 Global Step: 33250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:23:41,988-Speed 3465.01 samples/sec Loss 9.7442 LearningRate 0.1086 Epoch: 6 Global Step: 33260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:23:45,005-Speed 3395.01 samples/sec Loss 10.0276 LearningRate 0.1086 Epoch: 6 Global Step: 33270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:23:47,985-Speed 3438.24 samples/sec Loss 9.7502 LearningRate 0.1086 Epoch: 6 Global Step: 33280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:23:50,912-Speed 3498.50 samples/sec Loss 9.9190 LearningRate 0.1086 Epoch: 6 Global Step: 33290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:23:53,852-Speed 3484.99 samples/sec Loss 10.1457 LearningRate 0.1085 Epoch: 6 Global Step: 33300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:23:56,784-Speed 3493.62 samples/sec Loss 9.9103 LearningRate 0.1085 Epoch: 6 Global Step: 33310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:23:59,715-Speed 3493.67 samples/sec Loss 9.7318 LearningRate 0.1085 Epoch: 6 Global Step: 33320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:02,648-Speed 3493.38 samples/sec Loss 9.8322 LearningRate 0.1085 Epoch: 6 Global Step: 33330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:05,592-Speed 3478.53 samples/sec Loss 9.9317 LearningRate 0.1084 Epoch: 6 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:08,543-Speed 3471.46 samples/sec Loss 10.0713 LearningRate 0.1084 Epoch: 6 Global Step: 33350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:11,509-Speed 3452.67 samples/sec Loss 9.8886 LearningRate 0.1084 Epoch: 6 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:14,440-Speed 3494.79 samples/sec Loss 9.9578 LearningRate 0.1084 Epoch: 6 Global Step: 33370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:17,432-Speed 3424.28 samples/sec Loss 9.7962 LearningRate 0.1083 Epoch: 6 Global Step: 33380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:20,395-Speed 3457.12 samples/sec Loss 9.6479 LearningRate 0.1083 Epoch: 6 Global Step: 33390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:23,323-Speed 3497.70 samples/sec Loss 9.9537 LearningRate 0.1083 Epoch: 6 Global Step: 33400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:26,291-Speed 3451.23 samples/sec Loss 9.8157 LearningRate 0.1083 Epoch: 6 Global Step: 33410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:29,229-Speed 3486.31 samples/sec Loss 9.6275 LearningRate 0.1083 Epoch: 6 Global Step: 33420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:32,165-Speed 3488.93 samples/sec Loss 9.8262 LearningRate 0.1082 Epoch: 6 Global Step: 33430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:35,102-Speed 3486.94 samples/sec Loss 9.9037 LearningRate 0.1082 Epoch: 6 Global Step: 33440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:38,038-Speed 3489.34 samples/sec Loss 9.9501 LearningRate 0.1082 Epoch: 6 Global Step: 33450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:40,982-Speed 3478.76 samples/sec Loss 9.7284 LearningRate 0.1082 Epoch: 6 Global Step: 33460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:43,912-Speed 3496.22 samples/sec Loss 9.8939 LearningRate 0.1081 Epoch: 6 Global Step: 33470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:46,861-Speed 3473.84 samples/sec Loss 9.7754 LearningRate 0.1081 Epoch: 6 Global Step: 33480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:49,780-Speed 3509.22 samples/sec Loss 9.8558 LearningRate 0.1081 Epoch: 6 Global Step: 33490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:52,737-Speed 3463.81 samples/sec Loss 9.8706 LearningRate 0.1081 Epoch: 6 Global Step: 33500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:55,664-Speed 3498.98 samples/sec Loss 9.4523 LearningRate 0.1080 Epoch: 6 Global Step: 33510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:24:58,593-Speed 3497.01 samples/sec Loss 9.7793 LearningRate 0.1080 Epoch: 6 Global Step: 33520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:01,556-Speed 3456.97 samples/sec Loss 9.6662 LearningRate 0.1080 Epoch: 6 Global Step: 33530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:04,542-Speed 3430.01 samples/sec Loss 9.7196 LearningRate 0.1080 Epoch: 6 Global Step: 33540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:07,544-Speed 3412.66 samples/sec Loss 9.6962 LearningRate 0.1080 Epoch: 6 Global Step: 33550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:10,474-Speed 3495.61 samples/sec Loss 9.6682 LearningRate 0.1079 Epoch: 6 Global Step: 33560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:13,402-Speed 3499.13 samples/sec Loss 9.7427 LearningRate 0.1079 Epoch: 6 Global Step: 33570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:16,348-Speed 3476.60 samples/sec Loss 9.9067 LearningRate 0.1079 Epoch: 6 Global Step: 33580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:19,282-Speed 3490.43 samples/sec Loss 9.9323 LearningRate 0.1079 Epoch: 6 Global Step: 33590 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:25:22,217-Speed 3490.58 samples/sec Loss 9.9095 LearningRate 0.1078 Epoch: 6 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:25,200-Speed 3433.21 samples/sec Loss 9.8921 LearningRate 0.1078 Epoch: 6 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:28,217-Speed 3395.65 samples/sec Loss 9.6635 LearningRate 0.1078 Epoch: 6 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:31,149-Speed 3493.61 samples/sec Loss 9.8887 LearningRate 0.1078 Epoch: 6 Global Step: 33630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:34,104-Speed 3466.30 samples/sec Loss 9.6274 LearningRate 0.1077 Epoch: 6 Global Step: 33640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:25:37,044-Speed 3484.25 samples/sec Loss 9.8412 LearningRate 0.1077 Epoch: 6 Global Step: 33650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:25:39,996-Speed 3468.92 samples/sec Loss 10.0995 LearningRate 0.1077 Epoch: 6 Global Step: 33660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:25:43,027-Speed 3381.83 samples/sec Loss 9.8380 LearningRate 0.1077 Epoch: 6 Global Step: 33670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:25:45,959-Speed 3492.36 samples/sec Loss 9.7942 LearningRate 0.1076 Epoch: 6 Global Step: 33680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:25:48,894-Speed 3489.70 samples/sec Loss 9.7706 LearningRate 0.1076 Epoch: 6 Global Step: 33690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:25:51,822-Speed 3498.72 samples/sec Loss 9.7917 LearningRate 0.1076 Epoch: 6 Global Step: 33700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:25:54,760-Speed 3485.85 samples/sec Loss 9.8800 LearningRate 0.1076 Epoch: 6 Global Step: 33710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:25:57,737-Speed 3441.13 samples/sec Loss 9.5784 LearningRate 0.1076 Epoch: 6 Global Step: 33720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:26:00,665-Speed 3497.99 samples/sec Loss 9.8919 LearningRate 0.1075 Epoch: 6 Global Step: 33730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:26:03,596-Speed 3494.89 samples/sec Loss 9.8308 LearningRate 0.1075 Epoch: 6 Global Step: 33740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:26:06,576-Speed 3437.37 samples/sec Loss 9.8753 LearningRate 0.1075 Epoch: 6 Global Step: 33750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:26:09,502-Speed 3500.98 samples/sec Loss 9.7544 LearningRate 0.1075 Epoch: 6 Global Step: 33760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:26:12,430-Speed 3498.94 samples/sec Loss 9.7816 LearningRate 0.1074 Epoch: 6 Global Step: 33770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:26:15,373-Speed 3480.11 samples/sec Loss 9.9513 LearningRate 0.1074 Epoch: 6 Global Step: 33780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:26:18,343-Speed 3503.97 samples/sec Loss 9.7118 LearningRate 0.1074 Epoch: 6 Global Step: 33790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:26:21,279-Speed 3488.04 samples/sec Loss 9.8587 LearningRate 0.1074 Epoch: 6 Global Step: 33800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:26:24,208-Speed 3496.71 samples/sec Loss 9.8965 LearningRate 0.1073 Epoch: 6 Global Step: 33810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:26:27,228-Speed 3463.98 samples/sec Loss 9.8966 LearningRate 0.1073 Epoch: 6 Global Step: 33820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:26:30,182-Speed 3467.26 samples/sec Loss 9.5552 LearningRate 0.1073 Epoch: 6 Global Step: 33830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:26:33,183-Speed 3476.20 samples/sec Loss 9.6539 LearningRate 0.1073 Epoch: 6 Global Step: 33840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:26:36,116-Speed 3491.70 samples/sec Loss 9.5694 LearningRate 0.1073 Epoch: 6 Global Step: 33850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:26:39,111-Speed 3419.62 samples/sec Loss 9.7485 LearningRate 0.1072 Epoch: 6 Global Step: 33860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:26:42,287-Speed 3462.42 samples/sec Loss 9.7640 LearningRate 0.1072 Epoch: 6 Global Step: 33870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:26:45,246-Speed 3461.09 samples/sec Loss 9.7634 LearningRate 0.1072 Epoch: 6 Global Step: 33880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:26:48,185-Speed 3484.77 samples/sec Loss 9.7399 LearningRate 0.1072 Epoch: 6 Global Step: 33890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:26:51,119-Speed 3491.61 samples/sec Loss 9.8165 LearningRate 0.1071 Epoch: 6 Global Step: 33900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:26:54,051-Speed 3492.68 samples/sec Loss 9.7307 LearningRate 0.1071 Epoch: 6 Global Step: 33910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:26:57,013-Speed 3458.33 samples/sec Loss 9.5857 LearningRate 0.1071 Epoch: 6 Global Step: 33920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:26:59,962-Speed 3473.48 samples/sec Loss 9.7949 LearningRate 0.1071 Epoch: 6 Global Step: 33930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:27:02,970-Speed 3405.02 samples/sec Loss 9.6844 LearningRate 0.1070 Epoch: 6 Global Step: 33940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:27:05,903-Speed 3492.38 samples/sec Loss 9.8414 LearningRate 0.1070 Epoch: 6 Global Step: 33950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:27:08,843-Speed 3484.84 samples/sec Loss 9.6016 LearningRate 0.1070 Epoch: 6 Global Step: 33960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:27:11,775-Speed 3493.46 samples/sec Loss 9.6668 LearningRate 0.1070 Epoch: 6 Global Step: 33970 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:27:14,707-Speed 3493.07 samples/sec Loss 9.6648 LearningRate 0.1070 Epoch: 6 Global Step: 33980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:27:17,672-Speed 3455.62 samples/sec Loss 9.9013 LearningRate 0.1069 Epoch: 6 Global Step: 33990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:27:20,608-Speed 3487.85 samples/sec Loss 9.8416 LearningRate 0.1069 Epoch: 6 Global Step: 34000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:28:03,258-[lfw][34000]XNorm: 21.041891 Training: 2022-01-19 20:28:03,259-[lfw][34000]Accuracy-Flip: 0.99533+-0.00427 Training: 2022-01-19 20:28:03,259-[lfw][34000]Accuracy-Highest: 0.99700 Training: 2022-01-19 20:28:53,227-[cfp_fp][34000]XNorm: 18.560882 Training: 2022-01-19 20:28:53,228-[cfp_fp][34000]Accuracy-Flip: 0.95743+-0.00837 Training: 2022-01-19 20:28:53,228-[cfp_fp][34000]Accuracy-Highest: 0.95743 Training: 2022-01-19 20:29:35,929-[agedb_30][34000]XNorm: 20.704089 Training: 2022-01-19 20:29:35,930-[agedb_30][34000]Accuracy-Flip: 0.96700+-0.00891 Training: 2022-01-19 20:29:35,930-[agedb_30][34000]Accuracy-Highest: 0.96983 Training: 2022-01-19 20:29:38,929-Speed 74.03 samples/sec Loss 9.7157 LearningRate 0.1069 Epoch: 6 Global Step: 34010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:29:41,867-Speed 3485.76 samples/sec Loss 9.7785 LearningRate 0.1069 Epoch: 6 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:29:44,800-Speed 3492.06 samples/sec Loss 9.8713 LearningRate 0.1068 Epoch: 6 Global Step: 34030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:29:47,714-Speed 3515.64 samples/sec Loss 9.8878 LearningRate 0.1068 Epoch: 6 Global Step: 34040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:29:50,632-Speed 3509.77 samples/sec Loss 9.8263 LearningRate 0.1068 Epoch: 6 Global Step: 34050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:29:53,554-Speed 3506.10 samples/sec Loss 9.6938 LearningRate 0.1068 Epoch: 6 Global Step: 34060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:29:56,477-Speed 3504.67 samples/sec Loss 9.8796 LearningRate 0.1067 Epoch: 6 Global Step: 34070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:29:59,384-Speed 3523.71 samples/sec Loss 9.7671 LearningRate 0.1067 Epoch: 6 Global Step: 34080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:02,316-Speed 3493.76 samples/sec Loss 9.9152 LearningRate 0.1067 Epoch: 6 Global Step: 34090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:05,240-Speed 3502.60 samples/sec Loss 9.6776 LearningRate 0.1067 Epoch: 6 Global Step: 34100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:08,187-Speed 3476.02 samples/sec Loss 9.7794 LearningRate 0.1067 Epoch: 6 Global Step: 34110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:11,146-Speed 3460.98 samples/sec Loss 9.8209 LearningRate 0.1066 Epoch: 6 Global Step: 34120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:14,082-Speed 3488.48 samples/sec Loss 9.7509 LearningRate 0.1066 Epoch: 6 Global Step: 34130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:17,041-Speed 3463.03 samples/sec Loss 9.7655 LearningRate 0.1066 Epoch: 6 Global Step: 34140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:19,991-Speed 3472.09 samples/sec Loss 9.6833 LearningRate 0.1066 Epoch: 6 Global Step: 34150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:22,963-Speed 3445.72 samples/sec Loss 9.8316 LearningRate 0.1065 Epoch: 6 Global Step: 34160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:25,903-Speed 3484.81 samples/sec Loss 9.8051 LearningRate 0.1065 Epoch: 6 Global Step: 34170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:28,853-Speed 3472.21 samples/sec Loss 9.7192 LearningRate 0.1065 Epoch: 6 Global Step: 34180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:31,794-Speed 3482.28 samples/sec Loss 9.7449 LearningRate 0.1065 Epoch: 6 Global Step: 34190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:34,769-Speed 3442.19 samples/sec Loss 9.5424 LearningRate 0.1064 Epoch: 6 Global Step: 34200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:37,777-Speed 3406.23 samples/sec Loss 9.7271 LearningRate 0.1064 Epoch: 6 Global Step: 34210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:40,825-Speed 3360.15 samples/sec Loss 9.6586 LearningRate 0.1064 Epoch: 6 Global Step: 34220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:43,781-Speed 3464.43 samples/sec Loss 9.6889 LearningRate 0.1064 Epoch: 6 Global Step: 34230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:46,778-Speed 3418.47 samples/sec Loss 9.6084 LearningRate 0.1064 Epoch: 6 Global Step: 34240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:49,719-Speed 3482.04 samples/sec Loss 9.6613 LearningRate 0.1063 Epoch: 6 Global Step: 34250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:30:52,645-Speed 3501.87 samples/sec Loss 9.5860 LearningRate 0.1063 Epoch: 6 Global Step: 34260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:30:55,569-Speed 3502.84 samples/sec Loss 9.7247 LearningRate 0.1063 Epoch: 6 Global Step: 34270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:30:58,498-Speed 3496.47 samples/sec Loss 9.6299 LearningRate 0.1063 Epoch: 6 Global Step: 34280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:01,426-Speed 3498.01 samples/sec Loss 9.7424 LearningRate 0.1062 Epoch: 6 Global Step: 34290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:04,366-Speed 3483.57 samples/sec Loss 9.5975 LearningRate 0.1062 Epoch: 6 Global Step: 34300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:07,294-Speed 3499.15 samples/sec Loss 9.7287 LearningRate 0.1062 Epoch: 6 Global Step: 34310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:10,223-Speed 3496.32 samples/sec Loss 9.6339 LearningRate 0.1062 Epoch: 6 Global Step: 34320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:13,144-Speed 3507.24 samples/sec Loss 9.8262 LearningRate 0.1061 Epoch: 6 Global Step: 34330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:16,087-Speed 3480.84 samples/sec Loss 9.8274 LearningRate 0.1061 Epoch: 6 Global Step: 34340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:19,011-Speed 3503.31 samples/sec Loss 9.6789 LearningRate 0.1061 Epoch: 6 Global Step: 34350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:21,950-Speed 3484.61 samples/sec Loss 9.8225 LearningRate 0.1061 Epoch: 6 Global Step: 34360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:31:24,860-Speed 3519.79 samples/sec Loss 9.6209 LearningRate 0.1061 Epoch: 6 Global Step: 34370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:27,848-Speed 3427.99 samples/sec Loss 9.8574 LearningRate 0.1060 Epoch: 6 Global Step: 34380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:30,800-Speed 3469.58 samples/sec Loss 9.7911 LearningRate 0.1060 Epoch: 6 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:33,724-Speed 3503.08 samples/sec Loss 9.7848 LearningRate 0.1060 Epoch: 6 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:36,687-Speed 3457.34 samples/sec Loss 9.9510 LearningRate 0.1060 Epoch: 6 Global Step: 34410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:39,663-Speed 3441.68 samples/sec Loss 9.6921 LearningRate 0.1059 Epoch: 6 Global Step: 34420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:42,617-Speed 3467.61 samples/sec Loss 9.7472 LearningRate 0.1059 Epoch: 6 Global Step: 34430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:45,545-Speed 3498.65 samples/sec Loss 9.7735 LearningRate 0.1059 Epoch: 6 Global Step: 34440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:48,480-Speed 3489.83 samples/sec Loss 9.6481 LearningRate 0.1059 Epoch: 6 Global Step: 34450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:51,401-Speed 3506.36 samples/sec Loss 9.7663 LearningRate 0.1058 Epoch: 6 Global Step: 34460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:31:54,326-Speed 3502.17 samples/sec Loss 9.6536 LearningRate 0.1058 Epoch: 6 Global Step: 34470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:31:57,290-Speed 3456.51 samples/sec Loss 9.7332 LearningRate 0.1058 Epoch: 6 Global Step: 34480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:00,234-Speed 3479.20 samples/sec Loss 9.4611 LearningRate 0.1058 Epoch: 6 Global Step: 34490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:03,163-Speed 3497.29 samples/sec Loss 9.5901 LearningRate 0.1058 Epoch: 6 Global Step: 34500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:06,086-Speed 3503.25 samples/sec Loss 9.5955 LearningRate 0.1057 Epoch: 6 Global Step: 34510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:09,013-Speed 3500.23 samples/sec Loss 9.6867 LearningRate 0.1057 Epoch: 6 Global Step: 34520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:11,938-Speed 3501.88 samples/sec Loss 9.7336 LearningRate 0.1057 Epoch: 6 Global Step: 34530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:14,862-Speed 3502.87 samples/sec Loss 9.7728 LearningRate 0.1057 Epoch: 6 Global Step: 34540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:17,806-Speed 3479.67 samples/sec Loss 9.6668 LearningRate 0.1056 Epoch: 6 Global Step: 34550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:20,750-Speed 3478.45 samples/sec Loss 9.6924 LearningRate 0.1056 Epoch: 6 Global Step: 34560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:23,716-Speed 3454.17 samples/sec Loss 9.6581 LearningRate 0.1056 Epoch: 6 Global Step: 34570 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:32:26,645-Speed 3496.63 samples/sec Loss 9.6472 LearningRate 0.1056 Epoch: 6 Global Step: 34580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:29,606-Speed 3459.37 samples/sec Loss 9.7115 LearningRate 0.1055 Epoch: 6 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:32,601-Speed 3419.88 samples/sec Loss 9.6427 LearningRate 0.1055 Epoch: 6 Global Step: 34600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:35,554-Speed 3469.39 samples/sec Loss 9.6609 LearningRate 0.1055 Epoch: 6 Global Step: 34610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:38,508-Speed 3467.63 samples/sec Loss 9.6974 LearningRate 0.1055 Epoch: 6 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:41,458-Speed 3471.47 samples/sec Loss 9.6338 LearningRate 0.1055 Epoch: 6 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:44,430-Speed 3446.99 samples/sec Loss 9.6607 LearningRate 0.1054 Epoch: 6 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:47,412-Speed 3435.04 samples/sec Loss 9.6755 LearningRate 0.1054 Epoch: 6 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:50,359-Speed 3475.14 samples/sec Loss 9.7679 LearningRate 0.1054 Epoch: 6 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:32:53,302-Speed 3481.70 samples/sec Loss 9.8373 LearningRate 0.1054 Epoch: 6 Global Step: 34670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:32:56,262-Speed 3459.54 samples/sec Loss 9.7034 LearningRate 0.1053 Epoch: 6 Global Step: 34680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:32:59,216-Speed 3467.89 samples/sec Loss 9.7718 LearningRate 0.1053 Epoch: 6 Global Step: 34690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:33:02,151-Speed 3489.41 samples/sec Loss 9.7166 LearningRate 0.1053 Epoch: 6 Global Step: 34700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:33:05,076-Speed 3502.39 samples/sec Loss 9.6173 LearningRate 0.1053 Epoch: 6 Global Step: 34710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:33:08,022-Speed 3478.09 samples/sec Loss 9.6071 LearningRate 0.1052 Epoch: 6 Global Step: 34720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:33:10,971-Speed 3473.22 samples/sec Loss 9.8105 LearningRate 0.1052 Epoch: 6 Global Step: 34730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:33:13,915-Speed 3478.82 samples/sec Loss 9.6151 LearningRate 0.1052 Epoch: 6 Global Step: 34740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:33:16,945-Speed 3379.97 samples/sec Loss 9.8106 LearningRate 0.1052 Epoch: 6 Global Step: 34750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:33:19,869-Speed 3502.65 samples/sec Loss 9.5866 LearningRate 0.1052 Epoch: 6 Global Step: 34760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:33:22,810-Speed 3483.92 samples/sec Loss 9.6436 LearningRate 0.1051 Epoch: 6 Global Step: 34770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:33:25,743-Speed 3491.74 samples/sec Loss 9.5610 LearningRate 0.1051 Epoch: 6 Global Step: 34780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:33:28,668-Speed 3502.23 samples/sec Loss 9.7386 LearningRate 0.1051 Epoch: 6 Global Step: 34790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:33:31,603-Speed 3489.87 samples/sec Loss 9.5845 LearningRate 0.1051 Epoch: 6 Global Step: 34800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:33:34,547-Speed 3479.48 samples/sec Loss 9.7301 LearningRate 0.1050 Epoch: 6 Global Step: 34810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:33:37,479-Speed 3493.97 samples/sec Loss 9.7199 LearningRate 0.1050 Epoch: 6 Global Step: 34820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:33:40,410-Speed 3494.10 samples/sec Loss 9.6721 LearningRate 0.1050 Epoch: 6 Global Step: 34830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:33:43,352-Speed 3483.08 samples/sec Loss 9.4804 LearningRate 0.1050 Epoch: 6 Global Step: 34840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:33:46,294-Speed 3481.03 samples/sec Loss 9.6588 LearningRate 0.1050 Epoch: 6 Global Step: 34850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:33:49,231-Speed 3487.69 samples/sec Loss 9.7655 LearningRate 0.1049 Epoch: 6 Global Step: 34860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:33:52,196-Speed 3454.73 samples/sec Loss 9.6240 LearningRate 0.1049 Epoch: 6 Global Step: 34870 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:33:55,137-Speed 3482.05 samples/sec Loss 9.6898 LearningRate 0.1049 Epoch: 6 Global Step: 34880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:33:58,071-Speed 3492.46 samples/sec Loss 9.5962 LearningRate 0.1049 Epoch: 6 Global Step: 34890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:34:01,036-Speed 3454.38 samples/sec Loss 9.5926 LearningRate 0.1048 Epoch: 6 Global Step: 34900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:34:03,993-Speed 3464.75 samples/sec Loss 9.6293 LearningRate 0.1048 Epoch: 6 Global Step: 34910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:34:06,923-Speed 3495.49 samples/sec Loss 9.5710 LearningRate 0.1048 Epoch: 6 Global Step: 34920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:34:09,858-Speed 3489.68 samples/sec Loss 9.7419 LearningRate 0.1048 Epoch: 6 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:34:12,816-Speed 3463.24 samples/sec Loss 9.8318 LearningRate 0.1047 Epoch: 6 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:34:15,764-Speed 3473.41 samples/sec Loss 9.7598 LearningRate 0.1047 Epoch: 6 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:34:18,702-Speed 3486.70 samples/sec Loss 9.5052 LearningRate 0.1047 Epoch: 6 Global Step: 34960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:34:21,641-Speed 3485.31 samples/sec Loss 9.8094 LearningRate 0.1047 Epoch: 6 Global Step: 34970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:34:24,583-Speed 3481.03 samples/sec Loss 9.7070 LearningRate 0.1047 Epoch: 6 Global Step: 34980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:34:27,508-Speed 3502.16 samples/sec Loss 9.6137 LearningRate 0.1046 Epoch: 6 Global Step: 34990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:34:30,449-Speed 3483.40 samples/sec Loss 9.7813 LearningRate 0.1046 Epoch: 6 Global Step: 35000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:34:33,388-Speed 3485.47 samples/sec Loss 9.6584 LearningRate 0.1046 Epoch: 6 Global Step: 35010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:34:36,324-Speed 3488.64 samples/sec Loss 9.5529 LearningRate 0.1046 Epoch: 6 Global Step: 35020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:34:39,265-Speed 3482.64 samples/sec Loss 9.5000 LearningRate 0.1045 Epoch: 6 Global Step: 35030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:34:42,200-Speed 3489.05 samples/sec Loss 9.4475 LearningRate 0.1045 Epoch: 6 Global Step: 35040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:34:45,161-Speed 3459.74 samples/sec Loss 9.7311 LearningRate 0.1045 Epoch: 6 Global Step: 35050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:34:48,089-Speed 3498.15 samples/sec Loss 9.6873 LearningRate 0.1045 Epoch: 6 Global Step: 35060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:34:51,017-Speed 3497.93 samples/sec Loss 9.6737 LearningRate 0.1044 Epoch: 6 Global Step: 35070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:34:53,948-Speed 3495.64 samples/sec Loss 9.6659 LearningRate 0.1044 Epoch: 6 Global Step: 35080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:34:56,907-Speed 3461.78 samples/sec Loss 9.6445 LearningRate 0.1044 Epoch: 6 Global Step: 35090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:34:59,836-Speed 3496.46 samples/sec Loss 9.6885 LearningRate 0.1044 Epoch: 6 Global Step: 35100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:35:02,765-Speed 3496.90 samples/sec Loss 9.6876 LearningRate 0.1044 Epoch: 6 Global Step: 35110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:35:05,709-Speed 3479.29 samples/sec Loss 9.5828 LearningRate 0.1043 Epoch: 6 Global Step: 35120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:35:08,636-Speed 3500.95 samples/sec Loss 9.5147 LearningRate 0.1043 Epoch: 6 Global Step: 35130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:35:11,567-Speed 3494.23 samples/sec Loss 9.5957 LearningRate 0.1043 Epoch: 6 Global Step: 35140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:35:14,504-Speed 3486.78 samples/sec Loss 9.6263 LearningRate 0.1043 Epoch: 6 Global Step: 35150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:35:17,431-Speed 3502.26 samples/sec Loss 9.7680 LearningRate 0.1042 Epoch: 6 Global Step: 35160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:35:20,387-Speed 3464.12 samples/sec Loss 9.5840 LearningRate 0.1042 Epoch: 6 Global Step: 35170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:35:23,312-Speed 3502.71 samples/sec Loss 9.6412 LearningRate 0.1042 Epoch: 6 Global Step: 35180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:35:26,242-Speed 3495.83 samples/sec Loss 9.7603 LearningRate 0.1042 Epoch: 6 Global Step: 35190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:35:29,174-Speed 3493.68 samples/sec Loss 9.6181 LearningRate 0.1041 Epoch: 6 Global Step: 35200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:35:32,127-Speed 3468.10 samples/sec Loss 9.6160 LearningRate 0.1041 Epoch: 6 Global Step: 35210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:35:35,066-Speed 3485.60 samples/sec Loss 9.5871 LearningRate 0.1041 Epoch: 6 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:35:38,014-Speed 3473.99 samples/sec Loss 9.7072 LearningRate 0.1041 Epoch: 6 Global Step: 35230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:35:40,965-Speed 3470.97 samples/sec Loss 9.8026 LearningRate 0.1041 Epoch: 6 Global Step: 35240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:35:43,895-Speed 3497.32 samples/sec Loss 9.5357 LearningRate 0.1040 Epoch: 6 Global Step: 35250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:35:46,843-Speed 3474.59 samples/sec Loss 9.6482 LearningRate 0.1040 Epoch: 6 Global Step: 35260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:35:49,797-Speed 3467.91 samples/sec Loss 9.7592 LearningRate 0.1040 Epoch: 6 Global Step: 35270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:35:52,733-Speed 3489.21 samples/sec Loss 9.7996 LearningRate 0.1040 Epoch: 6 Global Step: 35280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:35:55,666-Speed 3491.75 samples/sec Loss 9.8141 LearningRate 0.1039 Epoch: 6 Global Step: 35290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:35:58,596-Speed 3495.66 samples/sec Loss 9.7570 LearningRate 0.1039 Epoch: 6 Global Step: 35300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:36:01,528-Speed 3493.15 samples/sec Loss 9.5029 LearningRate 0.1039 Epoch: 6 Global Step: 35310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:36:04,464-Speed 3489.52 samples/sec Loss 9.5315 LearningRate 0.1039 Epoch: 6 Global Step: 35320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:36:07,396-Speed 3493.05 samples/sec Loss 9.4815 LearningRate 0.1039 Epoch: 6 Global Step: 35330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:36:10,325-Speed 3497.26 samples/sec Loss 9.4995 LearningRate 0.1038 Epoch: 6 Global Step: 35340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:36:13,268-Speed 3480.59 samples/sec Loss 9.7323 LearningRate 0.1038 Epoch: 6 Global Step: 35350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:36:16,209-Speed 3482.06 samples/sec Loss 9.4345 LearningRate 0.1038 Epoch: 6 Global Step: 35360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:36:19,140-Speed 3495.63 samples/sec Loss 9.4130 LearningRate 0.1038 Epoch: 6 Global Step: 35370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:36:22,093-Speed 3468.71 samples/sec Loss 9.5874 LearningRate 0.1037 Epoch: 6 Global Step: 35380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:36:25,055-Speed 3457.76 samples/sec Loss 9.5833 LearningRate 0.1037 Epoch: 6 Global Step: 35390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:36:28,089-Speed 3375.66 samples/sec Loss 9.6209 LearningRate 0.1037 Epoch: 6 Global Step: 35400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:36:40,247-Speed 842.38 samples/sec Loss 9.3702 LearningRate 0.1037 Epoch: 7 Global Step: 35410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:36:43,207-Speed 3460.62 samples/sec Loss 8.8944 LearningRate 0.1036 Epoch: 7 Global Step: 35420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:36:46,168-Speed 3458.56 samples/sec Loss 8.8247 LearningRate 0.1036 Epoch: 7 Global Step: 35430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:36:49,103-Speed 3489.50 samples/sec Loss 9.0344 LearningRate 0.1036 Epoch: 7 Global Step: 35440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:36:52,088-Speed 3431.91 samples/sec Loss 8.8818 LearningRate 0.1036 Epoch: 7 Global Step: 35450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:36:55,108-Speed 3391.08 samples/sec Loss 8.8759 LearningRate 0.1036 Epoch: 7 Global Step: 35460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:36:58,042-Speed 3491.57 samples/sec Loss 8.7844 LearningRate 0.1035 Epoch: 7 Global Step: 35470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:37:00,976-Speed 3490.94 samples/sec Loss 8.9465 LearningRate 0.1035 Epoch: 7 Global Step: 35480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:37:03,930-Speed 3467.79 samples/sec Loss 8.7474 LearningRate 0.1035 Epoch: 7 Global Step: 35490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:37:06,865-Speed 3490.08 samples/sec Loss 8.8858 LearningRate 0.1035 Epoch: 7 Global Step: 35500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:37:09,799-Speed 3489.93 samples/sec Loss 8.8664 LearningRate 0.1034 Epoch: 7 Global Step: 35510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:37:12,787-Speed 3428.49 samples/sec Loss 8.9413 LearningRate 0.1034 Epoch: 7 Global Step: 35520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:37:15,727-Speed 3483.89 samples/sec Loss 9.0173 LearningRate 0.1034 Epoch: 7 Global Step: 35530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:37:18,694-Speed 3453.68 samples/sec Loss 9.0759 LearningRate 0.1034 Epoch: 7 Global Step: 35540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:37:21,666-Speed 3446.44 samples/sec Loss 9.1802 LearningRate 0.1034 Epoch: 7 Global Step: 35550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:37:24,670-Speed 3410.27 samples/sec Loss 9.0914 LearningRate 0.1033 Epoch: 7 Global Step: 35560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:37:27,607-Speed 3487.51 samples/sec Loss 9.0363 LearningRate 0.1033 Epoch: 7 Global Step: 35570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:37:30,561-Speed 3467.80 samples/sec Loss 8.9375 LearningRate 0.1033 Epoch: 7 Global Step: 35580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:37:33,493-Speed 3493.54 samples/sec Loss 9.1074 LearningRate 0.1033 Epoch: 7 Global Step: 35590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:37:36,500-Speed 3405.74 samples/sec Loss 9.0483 LearningRate 0.1032 Epoch: 7 Global Step: 35600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:37:39,471-Speed 3447.60 samples/sec Loss 9.0972 LearningRate 0.1032 Epoch: 7 Global Step: 35610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:37:42,543-Speed 3334.08 samples/sec Loss 8.9816 LearningRate 0.1032 Epoch: 7 Global Step: 35620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:37:45,489-Speed 3477.00 samples/sec Loss 8.9700 LearningRate 0.1032 Epoch: 7 Global Step: 35630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:37:48,433-Speed 3479.25 samples/sec Loss 9.1117 LearningRate 0.1031 Epoch: 7 Global Step: 35640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:37:51,368-Speed 3490.22 samples/sec Loss 9.0676 LearningRate 0.1031 Epoch: 7 Global Step: 35650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:37:54,340-Speed 3446.30 samples/sec Loss 9.1456 LearningRate 0.1031 Epoch: 7 Global Step: 35660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:37:57,288-Speed 3474.86 samples/sec Loss 9.1038 LearningRate 0.1031 Epoch: 7 Global Step: 35670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:00,230-Speed 3481.06 samples/sec Loss 9.1549 LearningRate 0.1031 Epoch: 7 Global Step: 35680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:03,201-Speed 3449.17 samples/sec Loss 9.3224 LearningRate 0.1030 Epoch: 7 Global Step: 35690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:06,323-Speed 3280.57 samples/sec Loss 9.2285 LearningRate 0.1030 Epoch: 7 Global Step: 35700 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:38:09,269-Speed 3477.97 samples/sec Loss 9.2326 LearningRate 0.1030 Epoch: 7 Global Step: 35710 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:38:12,199-Speed 3495.56 samples/sec Loss 9.1637 LearningRate 0.1030 Epoch: 7 Global Step: 35720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:15,132-Speed 3492.25 samples/sec Loss 9.2530 LearningRate 0.1029 Epoch: 7 Global Step: 35730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:18,091-Speed 3461.62 samples/sec Loss 9.2749 LearningRate 0.1029 Epoch: 7 Global Step: 35740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:21,052-Speed 3459.11 samples/sec Loss 9.2342 LearningRate 0.1029 Epoch: 7 Global Step: 35750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:24,019-Speed 3453.21 samples/sec Loss 9.1553 LearningRate 0.1029 Epoch: 7 Global Step: 35760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:26,984-Speed 3454.10 samples/sec Loss 9.2278 LearningRate 0.1029 Epoch: 7 Global Step: 35770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:29,956-Speed 3446.74 samples/sec Loss 9.3795 LearningRate 0.1028 Epoch: 7 Global Step: 35780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:32,928-Speed 3446.46 samples/sec Loss 9.1124 LearningRate 0.1028 Epoch: 7 Global Step: 35790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:35,948-Speed 3391.42 samples/sec Loss 9.2109 LearningRate 0.1028 Epoch: 7 Global Step: 35800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:38,914-Speed 3452.99 samples/sec Loss 9.2326 LearningRate 0.1028 Epoch: 7 Global Step: 35810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:41,852-Speed 3486.97 samples/sec Loss 9.0984 LearningRate 0.1027 Epoch: 7 Global Step: 35820 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:38:44,771-Speed 3508.38 samples/sec Loss 9.2338 LearningRate 0.1027 Epoch: 7 Global Step: 35830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:47,713-Speed 3481.91 samples/sec Loss 9.3015 LearningRate 0.1027 Epoch: 7 Global Step: 35840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:50,648-Speed 3490.25 samples/sec Loss 9.2995 LearningRate 0.1027 Epoch: 7 Global Step: 35850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:53,588-Speed 3483.26 samples/sec Loss 9.3380 LearningRate 0.1026 Epoch: 7 Global Step: 35860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:56,555-Speed 3452.25 samples/sec Loss 9.3607 LearningRate 0.1026 Epoch: 7 Global Step: 35870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:38:59,563-Speed 3405.75 samples/sec Loss 9.3602 LearningRate 0.1026 Epoch: 7 Global Step: 35880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:02,521-Speed 3462.28 samples/sec Loss 9.2392 LearningRate 0.1026 Epoch: 7 Global Step: 35890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:05,509-Speed 3428.50 samples/sec Loss 9.2910 LearningRate 0.1026 Epoch: 7 Global Step: 35900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:08,481-Speed 3447.32 samples/sec Loss 9.2261 LearningRate 0.1025 Epoch: 7 Global Step: 35910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:11,446-Speed 3454.43 samples/sec Loss 9.3467 LearningRate 0.1025 Epoch: 7 Global Step: 35920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:14,393-Speed 3474.99 samples/sec Loss 9.3963 LearningRate 0.1025 Epoch: 7 Global Step: 35930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:17,342-Speed 3473.41 samples/sec Loss 9.3722 LearningRate 0.1025 Epoch: 7 Global Step: 35940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:20,318-Speed 3442.19 samples/sec Loss 9.2884 LearningRate 0.1024 Epoch: 7 Global Step: 35950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:23,264-Speed 3477.14 samples/sec Loss 9.2879 LearningRate 0.1024 Epoch: 7 Global Step: 35960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:26,206-Speed 3480.94 samples/sec Loss 9.3416 LearningRate 0.1024 Epoch: 7 Global Step: 35970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:29,178-Speed 3447.72 samples/sec Loss 9.2758 LearningRate 0.1024 Epoch: 7 Global Step: 35980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:32,162-Speed 3432.34 samples/sec Loss 9.2654 LearningRate 0.1024 Epoch: 7 Global Step: 35990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:39:35,185-Speed 3388.31 samples/sec Loss 9.3985 LearningRate 0.1023 Epoch: 7 Global Step: 36000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:40:18,200-[lfw][36000]XNorm: 22.915930 Training: 2022-01-19 20:40:18,201-[lfw][36000]Accuracy-Flip: 0.99583+-0.00344 Training: 2022-01-19 20:40:18,201-[lfw][36000]Accuracy-Highest: 0.99700 Training: 2022-01-19 20:41:08,280-[cfp_fp][36000]XNorm: 19.482311 Training: 2022-01-19 20:41:08,281-[cfp_fp][36000]Accuracy-Flip: 0.94929+-0.01136 Training: 2022-01-19 20:41:08,282-[cfp_fp][36000]Accuracy-Highest: 0.95743 Training: 2022-01-19 20:41:51,444-[agedb_30][36000]XNorm: 22.410325 Training: 2022-01-19 20:41:51,444-[agedb_30][36000]Accuracy-Flip: 0.97117+-0.00654 Training: 2022-01-19 20:41:51,445-[agedb_30][36000]Accuracy-Highest: 0.97117 Training: 2022-01-19 20:41:54,366-Speed 73.57 samples/sec Loss 9.3909 LearningRate 0.1023 Epoch: 7 Global Step: 36010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:41:57,284-Speed 3510.57 samples/sec Loss 9.1863 LearningRate 0.1023 Epoch: 7 Global Step: 36020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:42:00,235-Speed 3470.44 samples/sec Loss 9.4108 LearningRate 0.1023 Epoch: 7 Global Step: 36030 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:42:03,202-Speed 3452.65 samples/sec Loss 9.3442 LearningRate 0.1022 Epoch: 7 Global Step: 36040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:42:06,158-Speed 3464.98 samples/sec Loss 9.2802 LearningRate 0.1022 Epoch: 7 Global Step: 36050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:42:09,070-Speed 3518.17 samples/sec Loss 9.3587 LearningRate 0.1022 Epoch: 7 Global Step: 36060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:42:12,039-Speed 3449.43 samples/sec Loss 9.5136 LearningRate 0.1022 Epoch: 7 Global Step: 36070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:42:14,975-Speed 3488.94 samples/sec Loss 9.3470 LearningRate 0.1022 Epoch: 7 Global Step: 36080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:42:17,901-Speed 3500.59 samples/sec Loss 9.5014 LearningRate 0.1021 Epoch: 7 Global Step: 36090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:42:20,827-Speed 3501.01 samples/sec Loss 9.2159 LearningRate 0.1021 Epoch: 7 Global Step: 36100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:42:23,782-Speed 3465.93 samples/sec Loss 9.4795 LearningRate 0.1021 Epoch: 7 Global Step: 36110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:42:26,790-Speed 3405.63 samples/sec Loss 9.3903 LearningRate 0.1021 Epoch: 7 Global Step: 36120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:42:29,732-Speed 3480.97 samples/sec Loss 9.5579 LearningRate 0.1020 Epoch: 7 Global Step: 36130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:42:32,702-Speed 3449.86 samples/sec Loss 9.4271 LearningRate 0.1020 Epoch: 7 Global Step: 36140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:42:35,633-Speed 3494.15 samples/sec Loss 9.3668 LearningRate 0.1020 Epoch: 7 Global Step: 36150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:42:38,583-Speed 3472.98 samples/sec Loss 9.3223 LearningRate 0.1020 Epoch: 7 Global Step: 36160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:42:41,547-Speed 3455.70 samples/sec Loss 9.1793 LearningRate 0.1019 Epoch: 7 Global Step: 36170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:42:44,497-Speed 3472.01 samples/sec Loss 9.5075 LearningRate 0.1019 Epoch: 7 Global Step: 36180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:42:47,426-Speed 3496.12 samples/sec Loss 9.4426 LearningRate 0.1019 Epoch: 7 Global Step: 36190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:42:50,368-Speed 3481.86 samples/sec Loss 9.3271 LearningRate 0.1019 Epoch: 7 Global Step: 36200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:42:53,313-Speed 3477.78 samples/sec Loss 9.5105 LearningRate 0.1019 Epoch: 7 Global Step: 36210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:42:56,286-Speed 3445.31 samples/sec Loss 9.3695 LearningRate 0.1018 Epoch: 7 Global Step: 36220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:42:59,217-Speed 3495.55 samples/sec Loss 9.4992 LearningRate 0.1018 Epoch: 7 Global Step: 36230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:02,148-Speed 3493.77 samples/sec Loss 9.4505 LearningRate 0.1018 Epoch: 7 Global Step: 36240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:05,136-Speed 3427.89 samples/sec Loss 9.4662 LearningRate 0.1018 Epoch: 7 Global Step: 36250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:08,118-Speed 3435.94 samples/sec Loss 9.5660 LearningRate 0.1017 Epoch: 7 Global Step: 36260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:11,092-Speed 3443.39 samples/sec Loss 9.4309 LearningRate 0.1017 Epoch: 7 Global Step: 36270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:14,022-Speed 3496.47 samples/sec Loss 9.3699 LearningRate 0.1017 Epoch: 7 Global Step: 36280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:16,951-Speed 3496.84 samples/sec Loss 9.5455 LearningRate 0.1017 Epoch: 7 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:19,907-Speed 3464.45 samples/sec Loss 9.3682 LearningRate 0.1017 Epoch: 7 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:22,836-Speed 3497.65 samples/sec Loss 9.3221 LearningRate 0.1016 Epoch: 7 Global Step: 36310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:25,787-Speed 3471.44 samples/sec Loss 9.3698 LearningRate 0.1016 Epoch: 7 Global Step: 36320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:28,744-Speed 3464.24 samples/sec Loss 9.4479 LearningRate 0.1016 Epoch: 7 Global Step: 36330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:31,680-Speed 3488.00 samples/sec Loss 9.3662 LearningRate 0.1016 Epoch: 7 Global Step: 36340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:34,663-Speed 3433.37 samples/sec Loss 9.2566 LearningRate 0.1015 Epoch: 7 Global Step: 36350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:37,628-Speed 3455.07 samples/sec Loss 9.5805 LearningRate 0.1015 Epoch: 7 Global Step: 36360 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:43:40,641-Speed 3399.29 samples/sec Loss 9.4015 LearningRate 0.1015 Epoch: 7 Global Step: 36370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:43,576-Speed 3490.30 samples/sec Loss 9.3683 LearningRate 0.1015 Epoch: 7 Global Step: 36380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:46,504-Speed 3497.83 samples/sec Loss 9.3935 LearningRate 0.1015 Epoch: 7 Global Step: 36390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:49,450-Speed 3477.34 samples/sec Loss 9.2496 LearningRate 0.1014 Epoch: 7 Global Step: 36400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:52,389-Speed 3484.75 samples/sec Loss 9.4454 LearningRate 0.1014 Epoch: 7 Global Step: 36410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:55,327-Speed 3486.98 samples/sec Loss 9.3680 LearningRate 0.1014 Epoch: 7 Global Step: 36420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:43:58,261-Speed 3491.31 samples/sec Loss 9.2402 LearningRate 0.1014 Epoch: 7 Global Step: 36430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:01,190-Speed 3496.67 samples/sec Loss 9.4249 LearningRate 0.1013 Epoch: 7 Global Step: 36440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:04,153-Speed 3457.30 samples/sec Loss 9.3925 LearningRate 0.1013 Epoch: 7 Global Step: 36450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:07,076-Speed 3503.77 samples/sec Loss 9.2681 LearningRate 0.1013 Epoch: 7 Global Step: 36460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:10,005-Speed 3497.28 samples/sec Loss 9.5098 LearningRate 0.1013 Epoch: 7 Global Step: 36470 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:44:12,940-Speed 3490.27 samples/sec Loss 9.3339 LearningRate 0.1012 Epoch: 7 Global Step: 36480 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:44:15,922-Speed 3433.70 samples/sec Loss 9.2281 LearningRate 0.1012 Epoch: 7 Global Step: 36490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:18,869-Speed 3477.45 samples/sec Loss 9.2980 LearningRate 0.1012 Epoch: 7 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:21,797-Speed 3497.68 samples/sec Loss 9.4500 LearningRate 0.1012 Epoch: 7 Global Step: 36510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:24,733-Speed 3488.64 samples/sec Loss 9.4820 LearningRate 0.1012 Epoch: 7 Global Step: 36520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:27,705-Speed 3447.73 samples/sec Loss 9.3610 LearningRate 0.1011 Epoch: 7 Global Step: 36530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:30,735-Speed 3380.00 samples/sec Loss 9.4215 LearningRate 0.1011 Epoch: 7 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:33,716-Speed 3436.43 samples/sec Loss 9.4903 LearningRate 0.1011 Epoch: 7 Global Step: 36550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:36,659-Speed 3479.99 samples/sec Loss 9.3479 LearningRate 0.1011 Epoch: 7 Global Step: 36560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:39,590-Speed 3494.82 samples/sec Loss 9.4281 LearningRate 0.1010 Epoch: 7 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:42,522-Speed 3493.43 samples/sec Loss 9.4238 LearningRate 0.1010 Epoch: 7 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:45,445-Speed 3504.45 samples/sec Loss 9.4648 LearningRate 0.1010 Epoch: 7 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:48,376-Speed 3500.20 samples/sec Loss 9.4773 LearningRate 0.1010 Epoch: 7 Global Step: 36600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:51,302-Speed 3501.51 samples/sec Loss 9.4485 LearningRate 0.1010 Epoch: 7 Global Step: 36610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:54,224-Speed 3505.57 samples/sec Loss 9.4503 LearningRate 0.1009 Epoch: 7 Global Step: 36620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:44:57,147-Speed 3504.08 samples/sec Loss 9.3538 LearningRate 0.1009 Epoch: 7 Global Step: 36630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:00,068-Speed 3506.03 samples/sec Loss 9.4685 LearningRate 0.1009 Epoch: 7 Global Step: 36640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:03,001-Speed 3493.00 samples/sec Loss 9.3897 LearningRate 0.1009 Epoch: 7 Global Step: 36650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:05,926-Speed 3501.64 samples/sec Loss 9.2873 LearningRate 0.1008 Epoch: 7 Global Step: 36660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:08,866-Speed 3484.98 samples/sec Loss 9.3425 LearningRate 0.1008 Epoch: 7 Global Step: 36670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:11,803-Speed 3486.47 samples/sec Loss 9.5433 LearningRate 0.1008 Epoch: 7 Global Step: 36680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:14,750-Speed 3475.76 samples/sec Loss 9.4402 LearningRate 0.1008 Epoch: 7 Global Step: 36690 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:45:17,664-Speed 3515.55 samples/sec Loss 9.5023 LearningRate 0.1008 Epoch: 7 Global Step: 36700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:20,647-Speed 3433.34 samples/sec Loss 9.3195 LearningRate 0.1007 Epoch: 7 Global Step: 36710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:23,575-Speed 3498.36 samples/sec Loss 9.3569 LearningRate 0.1007 Epoch: 7 Global Step: 36720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:26,500-Speed 3501.62 samples/sec Loss 9.3994 LearningRate 0.1007 Epoch: 7 Global Step: 36730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:29,425-Speed 3502.88 samples/sec Loss 9.4232 LearningRate 0.1007 Epoch: 7 Global Step: 36740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:32,351-Speed 3499.94 samples/sec Loss 9.4357 LearningRate 0.1006 Epoch: 7 Global Step: 36750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:35,273-Speed 3505.05 samples/sec Loss 9.3984 LearningRate 0.1006 Epoch: 7 Global Step: 36760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:38,227-Speed 3467.71 samples/sec Loss 9.2679 LearningRate 0.1006 Epoch: 7 Global Step: 36770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:45:41,145-Speed 3510.56 samples/sec Loss 9.2578 LearningRate 0.1006 Epoch: 7 Global Step: 36780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:45:44,077-Speed 3492.77 samples/sec Loss 9.3513 LearningRate 0.1006 Epoch: 7 Global Step: 36790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:45:47,025-Speed 3475.46 samples/sec Loss 9.3759 LearningRate 0.1005 Epoch: 7 Global Step: 36800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:45:49,949-Speed 3502.29 samples/sec Loss 9.3464 LearningRate 0.1005 Epoch: 7 Global Step: 36810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:45:52,875-Speed 3501.01 samples/sec Loss 9.4689 LearningRate 0.1005 Epoch: 7 Global Step: 36820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:45:55,809-Speed 3491.19 samples/sec Loss 9.3232 LearningRate 0.1005 Epoch: 7 Global Step: 36830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:45:58,738-Speed 3497.01 samples/sec Loss 9.3049 LearningRate 0.1004 Epoch: 7 Global Step: 36840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:01,666-Speed 3498.10 samples/sec Loss 9.2694 LearningRate 0.1004 Epoch: 7 Global Step: 36850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:04,594-Speed 3498.91 samples/sec Loss 9.3387 LearningRate 0.1004 Epoch: 7 Global Step: 36860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:07,516-Speed 3504.47 samples/sec Loss 9.3980 LearningRate 0.1004 Epoch: 7 Global Step: 36870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:10,458-Speed 3481.79 samples/sec Loss 9.4083 LearningRate 0.1004 Epoch: 7 Global Step: 36880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:13,423-Speed 3455.29 samples/sec Loss 9.3354 LearningRate 0.1003 Epoch: 7 Global Step: 36890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:16,352-Speed 3496.78 samples/sec Loss 9.3043 LearningRate 0.1003 Epoch: 7 Global Step: 36900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:19,309-Speed 3464.92 samples/sec Loss 9.3915 LearningRate 0.1003 Epoch: 7 Global Step: 36910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:22,270-Speed 3459.03 samples/sec Loss 9.3198 LearningRate 0.1003 Epoch: 7 Global Step: 36920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:25,237-Speed 3452.10 samples/sec Loss 9.4198 LearningRate 0.1002 Epoch: 7 Global Step: 36930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:28,162-Speed 3502.23 samples/sec Loss 9.4642 LearningRate 0.1002 Epoch: 7 Global Step: 36940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:31,094-Speed 3493.09 samples/sec Loss 9.4923 LearningRate 0.1002 Epoch: 7 Global Step: 36950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:34,037-Speed 3480.41 samples/sec Loss 9.2704 LearningRate 0.1002 Epoch: 7 Global Step: 36960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:36,967-Speed 3495.57 samples/sec Loss 9.4224 LearningRate 0.1001 Epoch: 7 Global Step: 36970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:46:39,923-Speed 3465.04 samples/sec Loss 9.4116 LearningRate 0.1001 Epoch: 7 Global Step: 36980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:46:42,856-Speed 3493.17 samples/sec Loss 9.2587 LearningRate 0.1001 Epoch: 7 Global Step: 36990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:46:45,792-Speed 3487.99 samples/sec Loss 9.2987 LearningRate 0.1001 Epoch: 7 Global Step: 37000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:46:48,715-Speed 3504.87 samples/sec Loss 9.3486 LearningRate 0.1001 Epoch: 7 Global Step: 37010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:46:51,640-Speed 3500.98 samples/sec Loss 9.5910 LearningRate 0.1000 Epoch: 7 Global Step: 37020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:46:54,569-Speed 3497.05 samples/sec Loss 9.4398 LearningRate 0.1000 Epoch: 7 Global Step: 37030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:46:57,496-Speed 3500.11 samples/sec Loss 9.2954 LearningRate 0.1000 Epoch: 7 Global Step: 37040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:00,419-Speed 3503.49 samples/sec Loss 9.4892 LearningRate 0.1000 Epoch: 7 Global Step: 37050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:03,351-Speed 3494.02 samples/sec Loss 9.4823 LearningRate 0.0999 Epoch: 7 Global Step: 37060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:06,289-Speed 3485.61 samples/sec Loss 9.4917 LearningRate 0.0999 Epoch: 7 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:09,238-Speed 3473.73 samples/sec Loss 9.4682 LearningRate 0.0999 Epoch: 7 Global Step: 37080 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:47:12,226-Speed 3428.20 samples/sec Loss 9.4102 LearningRate 0.0999 Epoch: 7 Global Step: 37090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:15,218-Speed 3423.28 samples/sec Loss 9.2669 LearningRate 0.0999 Epoch: 7 Global Step: 37100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:18,147-Speed 3497.71 samples/sec Loss 9.5157 LearningRate 0.0998 Epoch: 7 Global Step: 37110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:21,145-Speed 3416.10 samples/sec Loss 9.5254 LearningRate 0.0998 Epoch: 7 Global Step: 37120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:24,095-Speed 3472.18 samples/sec Loss 9.2252 LearningRate 0.0998 Epoch: 7 Global Step: 37130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:27,035-Speed 3484.08 samples/sec Loss 9.4499 LearningRate 0.0998 Epoch: 7 Global Step: 37140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:29,977-Speed 3481.33 samples/sec Loss 9.3023 LearningRate 0.0997 Epoch: 7 Global Step: 37150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:32,933-Speed 3464.05 samples/sec Loss 9.3389 LearningRate 0.0997 Epoch: 7 Global Step: 37160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:35,873-Speed 3484.72 samples/sec Loss 9.3973 LearningRate 0.0997 Epoch: 7 Global Step: 37170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:38,818-Speed 3478.54 samples/sec Loss 9.3642 LearningRate 0.0997 Epoch: 7 Global Step: 37180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:41,743-Speed 3501.20 samples/sec Loss 9.2884 LearningRate 0.0997 Epoch: 7 Global Step: 37190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:44,671-Speed 3498.82 samples/sec Loss 9.4661 LearningRate 0.0996 Epoch: 7 Global Step: 37200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:47,601-Speed 3495.95 samples/sec Loss 9.4019 LearningRate 0.0996 Epoch: 7 Global Step: 37210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:50,548-Speed 3474.65 samples/sec Loss 9.3613 LearningRate 0.0996 Epoch: 7 Global Step: 37220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:53,489-Speed 3483.35 samples/sec Loss 9.4557 LearningRate 0.0996 Epoch: 7 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:56,432-Speed 3480.82 samples/sec Loss 9.4326 LearningRate 0.0995 Epoch: 7 Global Step: 37240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:47:59,424-Speed 3422.92 samples/sec Loss 9.2630 LearningRate 0.0995 Epoch: 7 Global Step: 37250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:02,466-Speed 3367.69 samples/sec Loss 9.2237 LearningRate 0.0995 Epoch: 7 Global Step: 37260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:05,397-Speed 3494.13 samples/sec Loss 9.2703 LearningRate 0.0995 Epoch: 7 Global Step: 37270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:08,324-Speed 3499.87 samples/sec Loss 9.1770 LearningRate 0.0995 Epoch: 7 Global Step: 37280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:11,241-Speed 3510.81 samples/sec Loss 9.5035 LearningRate 0.0994 Epoch: 7 Global Step: 37290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:14,173-Speed 3494.64 samples/sec Loss 9.2279 LearningRate 0.0994 Epoch: 7 Global Step: 37300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:17,180-Speed 3405.16 samples/sec Loss 9.1610 LearningRate 0.0994 Epoch: 7 Global Step: 37310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:20,169-Speed 3427.37 samples/sec Loss 9.3349 LearningRate 0.0994 Epoch: 7 Global Step: 37320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:23,118-Speed 3473.24 samples/sec Loss 9.3075 LearningRate 0.0993 Epoch: 7 Global Step: 37330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:26,100-Speed 3434.33 samples/sec Loss 9.4017 LearningRate 0.0993 Epoch: 7 Global Step: 37340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:29,029-Speed 3498.87 samples/sec Loss 9.3622 LearningRate 0.0993 Epoch: 7 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:31,997-Speed 3450.55 samples/sec Loss 9.3568 LearningRate 0.0993 Epoch: 7 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:34,944-Speed 3475.95 samples/sec Loss 9.1497 LearningRate 0.0993 Epoch: 7 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:37,904-Speed 3460.68 samples/sec Loss 9.2541 LearningRate 0.0992 Epoch: 7 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:40,873-Speed 3449.59 samples/sec Loss 9.4317 LearningRate 0.0992 Epoch: 7 Global Step: 37390 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:48:43,821-Speed 3474.30 samples/sec Loss 9.4585 LearningRate 0.0992 Epoch: 7 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:46,757-Speed 3488.39 samples/sec Loss 9.3212 LearningRate 0.0992 Epoch: 7 Global Step: 37410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:49,686-Speed 3497.81 samples/sec Loss 9.2187 LearningRate 0.0991 Epoch: 7 Global Step: 37420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:52,726-Speed 3368.91 samples/sec Loss 9.3087 LearningRate 0.0991 Epoch: 7 Global Step: 37430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:55,692-Speed 3453.64 samples/sec Loss 9.3894 LearningRate 0.0991 Epoch: 7 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:48:58,627-Speed 3489.53 samples/sec Loss 9.3952 LearningRate 0.0991 Epoch: 7 Global Step: 37450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:49:01,556-Speed 3497.20 samples/sec Loss 9.2833 LearningRate 0.0991 Epoch: 7 Global Step: 37460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:49:04,556-Speed 3414.80 samples/sec Loss 9.3344 LearningRate 0.0990 Epoch: 7 Global Step: 37470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:49:07,493-Speed 3486.92 samples/sec Loss 9.5167 LearningRate 0.0990 Epoch: 7 Global Step: 37480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:49:10,428-Speed 3490.54 samples/sec Loss 9.2622 LearningRate 0.0990 Epoch: 7 Global Step: 37490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:49:13,345-Speed 3511.09 samples/sec Loss 9.4267 LearningRate 0.0990 Epoch: 7 Global Step: 37500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:49:16,275-Speed 3495.99 samples/sec Loss 9.2028 LearningRate 0.0989 Epoch: 7 Global Step: 37510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:49:19,327-Speed 3355.83 samples/sec Loss 9.2016 LearningRate 0.0989 Epoch: 7 Global Step: 37520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:49:22,281-Speed 3467.82 samples/sec Loss 9.3937 LearningRate 0.0989 Epoch: 7 Global Step: 37530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:49:25,281-Speed 3413.83 samples/sec Loss 9.4345 LearningRate 0.0989 Epoch: 7 Global Step: 37540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:49:28,212-Speed 3495.79 samples/sec Loss 9.3965 LearningRate 0.0989 Epoch: 7 Global Step: 37550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:49:31,160-Speed 3474.33 samples/sec Loss 9.4335 LearningRate 0.0988 Epoch: 7 Global Step: 37560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:49:34,103-Speed 3479.55 samples/sec Loss 9.4797 LearningRate 0.0988 Epoch: 7 Global Step: 37570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:49:37,049-Speed 3476.57 samples/sec Loss 9.4759 LearningRate 0.0988 Epoch: 7 Global Step: 37580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:49:40,042-Speed 3423.36 samples/sec Loss 9.3863 LearningRate 0.0988 Epoch: 7 Global Step: 37590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:49:43,005-Speed 3456.78 samples/sec Loss 9.2664 LearningRate 0.0987 Epoch: 7 Global Step: 37600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:49:45,943-Speed 3485.61 samples/sec Loss 9.3345 LearningRate 0.0987 Epoch: 7 Global Step: 37610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:49:48,880-Speed 3487.93 samples/sec Loss 9.3878 LearningRate 0.0987 Epoch: 7 Global Step: 37620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:49:51,887-Speed 3406.04 samples/sec Loss 9.3246 LearningRate 0.0987 Epoch: 7 Global Step: 37630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:49:54,911-Speed 3387.87 samples/sec Loss 9.2504 LearningRate 0.0987 Epoch: 7 Global Step: 37640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:49:57,845-Speed 3490.60 samples/sec Loss 9.3986 LearningRate 0.0986 Epoch: 7 Global Step: 37650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:00,776-Speed 3494.17 samples/sec Loss 9.3106 LearningRate 0.0986 Epoch: 7 Global Step: 37660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:03,705-Speed 3497.75 samples/sec Loss 9.3797 LearningRate 0.0986 Epoch: 7 Global Step: 37670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:06,695-Speed 3425.78 samples/sec Loss 9.2823 LearningRate 0.0986 Epoch: 7 Global Step: 37680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:09,657-Speed 3458.40 samples/sec Loss 9.1800 LearningRate 0.0985 Epoch: 7 Global Step: 37690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:12,610-Speed 3468.75 samples/sec Loss 9.3629 LearningRate 0.0985 Epoch: 7 Global Step: 37700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:15,561-Speed 3470.57 samples/sec Loss 9.1952 LearningRate 0.0985 Epoch: 7 Global Step: 37710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:18,493-Speed 3494.40 samples/sec Loss 9.2355 LearningRate 0.0985 Epoch: 7 Global Step: 37720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:21,424-Speed 3494.13 samples/sec Loss 9.2551 LearningRate 0.0985 Epoch: 7 Global Step: 37730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:24,365-Speed 3482.86 samples/sec Loss 9.1379 LearningRate 0.0984 Epoch: 7 Global Step: 37740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:27,297-Speed 3493.55 samples/sec Loss 9.3968 LearningRate 0.0984 Epoch: 7 Global Step: 37750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:30,229-Speed 3493.50 samples/sec Loss 9.2210 LearningRate 0.0984 Epoch: 7 Global Step: 37760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:33,162-Speed 3492.28 samples/sec Loss 9.2715 LearningRate 0.0984 Epoch: 7 Global Step: 37770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:36,118-Speed 3464.89 samples/sec Loss 9.2821 LearningRate 0.0983 Epoch: 7 Global Step: 37780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:39,059-Speed 3482.25 samples/sec Loss 9.4624 LearningRate 0.0983 Epoch: 7 Global Step: 37790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:42,023-Speed 3456.57 samples/sec Loss 9.1970 LearningRate 0.0983 Epoch: 7 Global Step: 37800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:44,993-Speed 3448.54 samples/sec Loss 9.3483 LearningRate 0.0983 Epoch: 7 Global Step: 37810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:47,951-Speed 3463.14 samples/sec Loss 9.4148 LearningRate 0.0983 Epoch: 7 Global Step: 37820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:50,881-Speed 3495.73 samples/sec Loss 9.3131 LearningRate 0.0982 Epoch: 7 Global Step: 37830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:53,811-Speed 3496.66 samples/sec Loss 9.3907 LearningRate 0.0982 Epoch: 7 Global Step: 37840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:56,742-Speed 3493.62 samples/sec Loss 9.3359 LearningRate 0.0982 Epoch: 7 Global Step: 37850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:50:59,671-Speed 3498.04 samples/sec Loss 9.2881 LearningRate 0.0982 Epoch: 7 Global Step: 37860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:02,596-Speed 3501.32 samples/sec Loss 9.3599 LearningRate 0.0981 Epoch: 7 Global Step: 37870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:05,538-Speed 3482.24 samples/sec Loss 9.2897 LearningRate 0.0981 Epoch: 7 Global Step: 37880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:08,500-Speed 3457.50 samples/sec Loss 9.3584 LearningRate 0.0981 Epoch: 7 Global Step: 37890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:11,456-Speed 3464.85 samples/sec Loss 9.0297 LearningRate 0.0981 Epoch: 7 Global Step: 37900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:14,448-Speed 3423.87 samples/sec Loss 9.4402 LearningRate 0.0981 Epoch: 7 Global Step: 37910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:17,415-Speed 3452.78 samples/sec Loss 9.3420 LearningRate 0.0980 Epoch: 7 Global Step: 37920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:20,339-Speed 3502.74 samples/sec Loss 9.4351 LearningRate 0.0980 Epoch: 7 Global Step: 37930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:23,293-Speed 3467.66 samples/sec Loss 9.4476 LearningRate 0.0980 Epoch: 7 Global Step: 37940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:26,238-Speed 3478.02 samples/sec Loss 9.1592 LearningRate 0.0980 Epoch: 7 Global Step: 37950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:29,167-Speed 3496.64 samples/sec Loss 9.3082 LearningRate 0.0979 Epoch: 7 Global Step: 37960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:32,096-Speed 3496.54 samples/sec Loss 9.3844 LearningRate 0.0979 Epoch: 7 Global Step: 37970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:35,036-Speed 3484.68 samples/sec Loss 9.3502 LearningRate 0.0979 Epoch: 7 Global Step: 37980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:37,976-Speed 3484.28 samples/sec Loss 9.3151 LearningRate 0.0979 Epoch: 7 Global Step: 37990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:51:40,913-Speed 3486.48 samples/sec Loss 9.2371 LearningRate 0.0979 Epoch: 7 Global Step: 38000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:52:24,102-[lfw][38000]XNorm: 24.412518 Training: 2022-01-19 20:52:24,102-[lfw][38000]Accuracy-Flip: 0.99650+-0.00311 Training: 2022-01-19 20:52:24,103-[lfw][38000]Accuracy-Highest: 0.99700 Training: 2022-01-19 20:53:13,990-[cfp_fp][38000]XNorm: 21.691769 Training: 2022-01-19 20:53:13,991-[cfp_fp][38000]Accuracy-Flip: 0.95229+-0.01038 Training: 2022-01-19 20:53:13,992-[cfp_fp][38000]Accuracy-Highest: 0.95743 Training: 2022-01-19 20:53:56,843-[agedb_30][38000]XNorm: 23.680399 Training: 2022-01-19 20:53:56,844-[agedb_30][38000]Accuracy-Flip: 0.96867+-0.00985 Training: 2022-01-19 20:53:56,844-[agedb_30][38000]Accuracy-Highest: 0.97117 Training: 2022-01-19 20:53:59,858-Speed 73.70 samples/sec Loss 9.4449 LearningRate 0.0978 Epoch: 7 Global Step: 38010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:02,781-Speed 3504.18 samples/sec Loss 9.4387 LearningRate 0.0978 Epoch: 7 Global Step: 38020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:05,717-Speed 3487.81 samples/sec Loss 9.3416 LearningRate 0.0978 Epoch: 7 Global Step: 38030 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:54:08,647-Speed 3496.06 samples/sec Loss 9.4106 LearningRate 0.0978 Epoch: 7 Global Step: 38040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:11,572-Speed 3501.94 samples/sec Loss 9.5418 LearningRate 0.0977 Epoch: 7 Global Step: 38050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:14,551-Speed 3439.02 samples/sec Loss 9.2664 LearningRate 0.0977 Epoch: 7 Global Step: 38060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:17,479-Speed 3498.63 samples/sec Loss 9.4152 LearningRate 0.0977 Epoch: 7 Global Step: 38070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:20,425-Speed 3476.25 samples/sec Loss 9.0655 LearningRate 0.0977 Epoch: 7 Global Step: 38080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:23,355-Speed 3496.24 samples/sec Loss 9.1213 LearningRate 0.0977 Epoch: 7 Global Step: 38090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:26,292-Speed 3487.03 samples/sec Loss 9.2873 LearningRate 0.0976 Epoch: 7 Global Step: 38100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:29,237-Speed 3479.77 samples/sec Loss 9.3608 LearningRate 0.0976 Epoch: 7 Global Step: 38110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:32,193-Speed 3463.81 samples/sec Loss 9.2155 LearningRate 0.0976 Epoch: 7 Global Step: 38120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:35,132-Speed 3485.61 samples/sec Loss 9.2321 LearningRate 0.0976 Epoch: 7 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:38,056-Speed 3503.24 samples/sec Loss 9.3235 LearningRate 0.0975 Epoch: 7 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:40,995-Speed 3484.98 samples/sec Loss 9.5691 LearningRate 0.0975 Epoch: 7 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:43,933-Speed 3486.96 samples/sec Loss 9.3186 LearningRate 0.0975 Epoch: 7 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:54:46,857-Speed 3502.52 samples/sec Loss 9.1063 LearningRate 0.0975 Epoch: 7 Global Step: 38170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:54:49,780-Speed 3505.09 samples/sec Loss 9.2475 LearningRate 0.0975 Epoch: 7 Global Step: 38180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:54:52,726-Speed 3475.66 samples/sec Loss 9.1694 LearningRate 0.0974 Epoch: 7 Global Step: 38190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:54:55,669-Speed 3480.44 samples/sec Loss 9.3573 LearningRate 0.0974 Epoch: 7 Global Step: 38200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:54:58,654-Speed 3431.93 samples/sec Loss 9.2867 LearningRate 0.0974 Epoch: 7 Global Step: 38210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:55:01,598-Speed 3478.68 samples/sec Loss 9.2210 LearningRate 0.0974 Epoch: 7 Global Step: 38220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:55:04,557-Speed 3462.48 samples/sec Loss 9.2225 LearningRate 0.0973 Epoch: 7 Global Step: 38230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:55:07,514-Speed 3463.57 samples/sec Loss 9.1672 LearningRate 0.0973 Epoch: 7 Global Step: 38240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:55:10,450-Speed 3488.24 samples/sec Loss 9.3163 LearningRate 0.0973 Epoch: 7 Global Step: 38250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:55:13,398-Speed 3475.92 samples/sec Loss 9.2220 LearningRate 0.0973 Epoch: 7 Global Step: 38260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:55:16,332-Speed 3490.04 samples/sec Loss 9.3891 LearningRate 0.0973 Epoch: 7 Global Step: 38270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-19 20:55:19,262-Speed 3497.81 samples/sec Loss 9.3318 LearningRate 0.0972 Epoch: 7 Global Step: 38280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:55:22,201-Speed 3484.47 samples/sec Loss 9.2739 LearningRate 0.0972 Epoch: 7 Global Step: 38290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:55:25,195-Speed 3420.72 samples/sec Loss 9.1711 LearningRate 0.0972 Epoch: 7 Global Step: 38300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:55:28,130-Speed 3490.46 samples/sec Loss 9.2744 LearningRate 0.0972 Epoch: 7 Global Step: 38310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:55:31,077-Speed 3475.56 samples/sec Loss 9.2944 LearningRate 0.0971 Epoch: 7 Global Step: 38320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:55:34,018-Speed 3483.64 samples/sec Loss 9.4830 LearningRate 0.0971 Epoch: 7 Global Step: 38330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:55:36,981-Speed 3456.86 samples/sec Loss 9.3119 LearningRate 0.0971 Epoch: 7 Global Step: 38340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:55:39,916-Speed 3490.02 samples/sec Loss 9.4307 LearningRate 0.0971 Epoch: 7 Global Step: 38350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:55:42,847-Speed 3494.27 samples/sec Loss 9.2504 LearningRate 0.0971 Epoch: 7 Global Step: 38360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:55:45,783-Speed 3488.64 samples/sec Loss 9.5138 LearningRate 0.0970 Epoch: 7 Global Step: 38370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:55:48,725-Speed 3481.63 samples/sec Loss 9.3985 LearningRate 0.0970 Epoch: 7 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:55:51,652-Speed 3499.48 samples/sec Loss 9.1678 LearningRate 0.0970 Epoch: 7 Global Step: 38390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:55:54,592-Speed 3484.66 samples/sec Loss 9.2715 LearningRate 0.0970 Epoch: 7 Global Step: 38400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:55:57,524-Speed 3492.84 samples/sec Loss 9.2129 LearningRate 0.0970 Epoch: 7 Global Step: 38410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:56:00,451-Speed 3499.16 samples/sec Loss 9.2611 LearningRate 0.0969 Epoch: 7 Global Step: 38420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:56:03,407-Speed 3465.79 samples/sec Loss 9.2492 LearningRate 0.0969 Epoch: 7 Global Step: 38430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:56:06,464-Speed 3351.25 samples/sec Loss 9.4158 LearningRate 0.0969 Epoch: 7 Global Step: 38440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:56:09,393-Speed 3496.87 samples/sec Loss 9.1925 LearningRate 0.0969 Epoch: 7 Global Step: 38450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:56:12,326-Speed 3492.55 samples/sec Loss 9.0852 LearningRate 0.0968 Epoch: 7 Global Step: 38460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:56:15,270-Speed 3478.65 samples/sec Loss 9.1988 LearningRate 0.0968 Epoch: 7 Global Step: 38470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:56:18,292-Speed 3390.34 samples/sec Loss 9.3108 LearningRate 0.0968 Epoch: 7 Global Step: 38480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:56:21,298-Speed 3407.64 samples/sec Loss 9.1820 LearningRate 0.0968 Epoch: 7 Global Step: 38490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:56:24,240-Speed 3482.11 samples/sec Loss 9.1944 LearningRate 0.0968 Epoch: 7 Global Step: 38500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:56:27,181-Speed 3482.28 samples/sec Loss 9.3176 LearningRate 0.0967 Epoch: 7 Global Step: 38510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:56:31,198-Speed 2549.46 samples/sec Loss 9.2741 LearningRate 0.0967 Epoch: 7 Global Step: 38520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:56:34,163-Speed 3455.67 samples/sec Loss 9.2828 LearningRate 0.0967 Epoch: 7 Global Step: 38530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:56:37,101-Speed 3486.03 samples/sec Loss 9.1492 LearningRate 0.0967 Epoch: 7 Global Step: 38540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:56:40,033-Speed 3493.06 samples/sec Loss 9.2595 LearningRate 0.0966 Epoch: 7 Global Step: 38550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:56:43,002-Speed 3450.01 samples/sec Loss 9.1398 LearningRate 0.0966 Epoch: 7 Global Step: 38560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:56:45,940-Speed 3486.48 samples/sec Loss 9.3435 LearningRate 0.0966 Epoch: 7 Global Step: 38570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:56:48,875-Speed 3489.17 samples/sec Loss 9.2402 LearningRate 0.0966 Epoch: 7 Global Step: 38580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:56:51,811-Speed 3488.96 samples/sec Loss 9.2912 LearningRate 0.0966 Epoch: 7 Global Step: 38590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:56:54,828-Speed 3395.63 samples/sec Loss 9.2276 LearningRate 0.0965 Epoch: 7 Global Step: 38600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:56:57,758-Speed 3494.70 samples/sec Loss 9.1901 LearningRate 0.0965 Epoch: 7 Global Step: 38610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:00,701-Speed 3480.70 samples/sec Loss 9.4103 LearningRate 0.0965 Epoch: 7 Global Step: 38620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:03,623-Speed 3506.07 samples/sec Loss 9.3710 LearningRate 0.0965 Epoch: 7 Global Step: 38630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:06,550-Speed 3499.78 samples/sec Loss 9.4668 LearningRate 0.0964 Epoch: 7 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:09,482-Speed 3493.71 samples/sec Loss 9.3816 LearningRate 0.0964 Epoch: 7 Global Step: 38650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:12,413-Speed 3493.45 samples/sec Loss 9.2078 LearningRate 0.0964 Epoch: 7 Global Step: 38660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:15,362-Speed 3473.89 samples/sec Loss 9.1558 LearningRate 0.0964 Epoch: 7 Global Step: 38670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:18,299-Speed 3486.88 samples/sec Loss 9.3296 LearningRate 0.0964 Epoch: 7 Global Step: 38680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:21,240-Speed 3483.26 samples/sec Loss 9.0593 LearningRate 0.0963 Epoch: 7 Global Step: 38690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:24,196-Speed 3465.39 samples/sec Loss 9.0689 LearningRate 0.0963 Epoch: 7 Global Step: 38700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:27,137-Speed 3482.49 samples/sec Loss 9.2403 LearningRate 0.0963 Epoch: 7 Global Step: 38710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:30,082-Speed 3478.80 samples/sec Loss 9.3424 LearningRate 0.0963 Epoch: 7 Global Step: 38720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:33,038-Speed 3464.56 samples/sec Loss 9.3803 LearningRate 0.0962 Epoch: 7 Global Step: 38730 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 20:57:35,981-Speed 3481.02 samples/sec Loss 9.2447 LearningRate 0.0962 Epoch: 7 Global Step: 38740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:38,935-Speed 3466.59 samples/sec Loss 9.2943 LearningRate 0.0962 Epoch: 7 Global Step: 38750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:57:41,886-Speed 3470.90 samples/sec Loss 9.2972 LearningRate 0.0962 Epoch: 7 Global Step: 38760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:57:44,868-Speed 3435.43 samples/sec Loss 9.2045 LearningRate 0.0962 Epoch: 7 Global Step: 38770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:57:47,805-Speed 3487.42 samples/sec Loss 9.2809 LearningRate 0.0961 Epoch: 7 Global Step: 38780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:57:50,781-Speed 3441.28 samples/sec Loss 9.1799 LearningRate 0.0961 Epoch: 7 Global Step: 38790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:57:53,736-Speed 3467.17 samples/sec Loss 9.2167 LearningRate 0.0961 Epoch: 7 Global Step: 38800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:57:56,688-Speed 3470.35 samples/sec Loss 9.2625 LearningRate 0.0961 Epoch: 7 Global Step: 38810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:57:59,729-Speed 3367.92 samples/sec Loss 9.0243 LearningRate 0.0961 Epoch: 7 Global Step: 38820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:58:02,689-Speed 3460.97 samples/sec Loss 9.4794 LearningRate 0.0960 Epoch: 7 Global Step: 38830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:58:05,681-Speed 3423.26 samples/sec Loss 9.2180 LearningRate 0.0960 Epoch: 7 Global Step: 38840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:58:08,627-Speed 3476.70 samples/sec Loss 9.1086 LearningRate 0.0960 Epoch: 7 Global Step: 38850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:58:11,565-Speed 3485.53 samples/sec Loss 9.1123 LearningRate 0.0960 Epoch: 7 Global Step: 38860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:14,559-Speed 3421.15 samples/sec Loss 9.1321 LearningRate 0.0959 Epoch: 7 Global Step: 38870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:17,512-Speed 3468.72 samples/sec Loss 9.4020 LearningRate 0.0959 Epoch: 7 Global Step: 38880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:20,448-Speed 3488.44 samples/sec Loss 9.2884 LearningRate 0.0959 Epoch: 7 Global Step: 38890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:23,459-Speed 3402.48 samples/sec Loss 9.1089 LearningRate 0.0959 Epoch: 7 Global Step: 38900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:26,464-Speed 3408.89 samples/sec Loss 9.0983 LearningRate 0.0959 Epoch: 7 Global Step: 38910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:29,404-Speed 3483.41 samples/sec Loss 8.9963 LearningRate 0.0958 Epoch: 7 Global Step: 38920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:32,340-Speed 3488.69 samples/sec Loss 9.2339 LearningRate 0.0958 Epoch: 7 Global Step: 38930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:35,273-Speed 3492.04 samples/sec Loss 9.0970 LearningRate 0.0958 Epoch: 7 Global Step: 38940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:38,209-Speed 3488.55 samples/sec Loss 9.5466 LearningRate 0.0958 Epoch: 7 Global Step: 38950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:41,137-Speed 3498.03 samples/sec Loss 9.2974 LearningRate 0.0957 Epoch: 7 Global Step: 38960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:44,079-Speed 3481.79 samples/sec Loss 9.1590 LearningRate 0.0957 Epoch: 7 Global Step: 38970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:47,009-Speed 3495.78 samples/sec Loss 9.2512 LearningRate 0.0957 Epoch: 7 Global Step: 38980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:49,939-Speed 3496.17 samples/sec Loss 9.2832 LearningRate 0.0957 Epoch: 7 Global Step: 38990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:52,902-Speed 3456.86 samples/sec Loss 9.2726 LearningRate 0.0957 Epoch: 7 Global Step: 39000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:55,867-Speed 3454.10 samples/sec Loss 9.3660 LearningRate 0.0956 Epoch: 7 Global Step: 39010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:58:58,843-Speed 3442.47 samples/sec Loss 9.2571 LearningRate 0.0956 Epoch: 7 Global Step: 39020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:59:01,848-Speed 3408.22 samples/sec Loss 9.1250 LearningRate 0.0956 Epoch: 7 Global Step: 39030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:59:04,848-Speed 3415.00 samples/sec Loss 9.3873 LearningRate 0.0956 Epoch: 7 Global Step: 39040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:59:07,948-Speed 3304.06 samples/sec Loss 9.2892 LearningRate 0.0955 Epoch: 7 Global Step: 39050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:59:10,947-Speed 3414.50 samples/sec Loss 9.1445 LearningRate 0.0955 Epoch: 7 Global Step: 39060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:59:13,919-Speed 3447.32 samples/sec Loss 9.1282 LearningRate 0.0955 Epoch: 7 Global Step: 39070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:59:16,968-Speed 3358.70 samples/sec Loss 9.0996 LearningRate 0.0955 Epoch: 7 Global Step: 39080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:59:20,057-Speed 3316.22 samples/sec Loss 8.9284 LearningRate 0.0955 Epoch: 7 Global Step: 39090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:59:23,072-Speed 3397.11 samples/sec Loss 9.2830 LearningRate 0.0954 Epoch: 7 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:59:26,073-Speed 3413.50 samples/sec Loss 9.3063 LearningRate 0.0954 Epoch: 7 Global Step: 39110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:59:29,042-Speed 3450.43 samples/sec Loss 9.1944 LearningRate 0.0954 Epoch: 7 Global Step: 39120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:59:31,999-Speed 3462.82 samples/sec Loss 8.9994 LearningRate 0.0954 Epoch: 7 Global Step: 39130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:59:34,984-Speed 3432.45 samples/sec Loss 9.1884 LearningRate 0.0953 Epoch: 7 Global Step: 39140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:59:37,922-Speed 3485.53 samples/sec Loss 9.1845 LearningRate 0.0953 Epoch: 7 Global Step: 39150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:59:40,859-Speed 3487.22 samples/sec Loss 9.1445 LearningRate 0.0953 Epoch: 7 Global Step: 39160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:59:43,800-Speed 3483.74 samples/sec Loss 9.0775 LearningRate 0.0953 Epoch: 7 Global Step: 39170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:59:46,731-Speed 3493.71 samples/sec Loss 9.1443 LearningRate 0.0953 Epoch: 7 Global Step: 39180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:59:49,705-Speed 3444.59 samples/sec Loss 9.2876 LearningRate 0.0952 Epoch: 7 Global Step: 39190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:59:52,638-Speed 3492.07 samples/sec Loss 9.1294 LearningRate 0.0952 Epoch: 7 Global Step: 39200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 20:59:55,575-Speed 3488.09 samples/sec Loss 9.2035 LearningRate 0.0952 Epoch: 7 Global Step: 39210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 20:59:58,498-Speed 3504.23 samples/sec Loss 9.0310 LearningRate 0.0952 Epoch: 7 Global Step: 39220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:00:01,435-Speed 3487.18 samples/sec Loss 9.3225 LearningRate 0.0952 Epoch: 7 Global Step: 39230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:00:04,379-Speed 3479.82 samples/sec Loss 9.2731 LearningRate 0.0951 Epoch: 7 Global Step: 39240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:00:07,309-Speed 3495.63 samples/sec Loss 9.2131 LearningRate 0.0951 Epoch: 7 Global Step: 39250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:00:10,243-Speed 3492.04 samples/sec Loss 9.3112 LearningRate 0.0951 Epoch: 7 Global Step: 39260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:00:13,206-Speed 3456.90 samples/sec Loss 9.1071 LearningRate 0.0951 Epoch: 7 Global Step: 39270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:00:16,195-Speed 3426.27 samples/sec Loss 9.1662 LearningRate 0.0950 Epoch: 7 Global Step: 39280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:00:19,138-Speed 3481.55 samples/sec Loss 9.1354 LearningRate 0.0950 Epoch: 7 Global Step: 39290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:00:22,128-Speed 3424.68 samples/sec Loss 9.1875 LearningRate 0.0950 Epoch: 7 Global Step: 39300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:00:25,121-Speed 3423.05 samples/sec Loss 9.1285 LearningRate 0.0950 Epoch: 7 Global Step: 39310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:00:28,080-Speed 3460.56 samples/sec Loss 9.1532 LearningRate 0.0950 Epoch: 7 Global Step: 39320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:00:31,081-Speed 3413.88 samples/sec Loss 9.1600 LearningRate 0.0949 Epoch: 7 Global Step: 39330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:00:34,011-Speed 3496.13 samples/sec Loss 9.2106 LearningRate 0.0949 Epoch: 7 Global Step: 39340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:00:36,970-Speed 3460.91 samples/sec Loss 9.0884 LearningRate 0.0949 Epoch: 7 Global Step: 39350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:00:39,957-Speed 3430.31 samples/sec Loss 9.1314 LearningRate 0.0949 Epoch: 7 Global Step: 39360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:00:42,961-Speed 3409.70 samples/sec Loss 9.1455 LearningRate 0.0948 Epoch: 7 Global Step: 39370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:00:45,893-Speed 3493.05 samples/sec Loss 9.0197 LearningRate 0.0948 Epoch: 7 Global Step: 39380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:00:48,827-Speed 3491.28 samples/sec Loss 9.0474 LearningRate 0.0948 Epoch: 7 Global Step: 39390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:00:51,771-Speed 3478.41 samples/sec Loss 9.0493 LearningRate 0.0948 Epoch: 7 Global Step: 39400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:00:54,719-Speed 3475.59 samples/sec Loss 9.3311 LearningRate 0.0948 Epoch: 7 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:00:57,646-Speed 3499.07 samples/sec Loss 9.1953 LearningRate 0.0947 Epoch: 7 Global Step: 39420 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:01:00,567-Speed 3506.54 samples/sec Loss 9.0836 LearningRate 0.0947 Epoch: 7 Global Step: 39430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:03,521-Speed 3468.52 samples/sec Loss 9.2264 LearningRate 0.0947 Epoch: 7 Global Step: 39440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:06,466-Speed 3477.21 samples/sec Loss 9.2738 LearningRate 0.0947 Epoch: 7 Global Step: 39450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:09,455-Speed 3427.91 samples/sec Loss 9.1470 LearningRate 0.0947 Epoch: 7 Global Step: 39460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:12,444-Speed 3426.95 samples/sec Loss 9.2624 LearningRate 0.0946 Epoch: 7 Global Step: 39470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:15,457-Speed 3399.42 samples/sec Loss 9.1889 LearningRate 0.0946 Epoch: 7 Global Step: 39480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:18,448-Speed 3423.64 samples/sec Loss 9.1652 LearningRate 0.0946 Epoch: 7 Global Step: 39490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:21,515-Speed 3339.60 samples/sec Loss 9.2447 LearningRate 0.0946 Epoch: 7 Global Step: 39500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:24,516-Speed 3413.80 samples/sec Loss 9.3209 LearningRate 0.0945 Epoch: 7 Global Step: 39510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:27,479-Speed 3456.96 samples/sec Loss 9.0208 LearningRate 0.0945 Epoch: 7 Global Step: 39520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:30,415-Speed 3488.83 samples/sec Loss 9.1312 LearningRate 0.0945 Epoch: 7 Global Step: 39530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:33,370-Speed 3466.19 samples/sec Loss 9.0626 LearningRate 0.0945 Epoch: 7 Global Step: 39540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:36,335-Speed 3454.44 samples/sec Loss 9.1643 LearningRate 0.0945 Epoch: 7 Global Step: 39550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:39,272-Speed 3487.85 samples/sec Loss 9.2639 LearningRate 0.0944 Epoch: 7 Global Step: 39560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:42,233-Speed 3459.65 samples/sec Loss 9.1565 LearningRate 0.0944 Epoch: 7 Global Step: 39570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:45,179-Speed 3476.19 samples/sec Loss 9.1662 LearningRate 0.0944 Epoch: 7 Global Step: 39580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:48,111-Speed 3493.41 samples/sec Loss 9.0683 LearningRate 0.0944 Epoch: 7 Global Step: 39590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:51,042-Speed 3494.16 samples/sec Loss 9.0486 LearningRate 0.0943 Epoch: 7 Global Step: 39600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:54,068-Speed 3385.38 samples/sec Loss 9.0898 LearningRate 0.0943 Epoch: 7 Global Step: 39610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:01:57,031-Speed 3456.84 samples/sec Loss 9.1972 LearningRate 0.0943 Epoch: 7 Global Step: 39620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:00,001-Speed 3449.06 samples/sec Loss 9.0787 LearningRate 0.0943 Epoch: 7 Global Step: 39630 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:02:02,954-Speed 3468.74 samples/sec Loss 9.0736 LearningRate 0.0943 Epoch: 7 Global Step: 39640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:05,988-Speed 3376.26 samples/sec Loss 9.1071 LearningRate 0.0942 Epoch: 7 Global Step: 39650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:08,969-Speed 3435.93 samples/sec Loss 9.1316 LearningRate 0.0942 Epoch: 7 Global Step: 39660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:11,906-Speed 3487.85 samples/sec Loss 9.2665 LearningRate 0.0942 Epoch: 7 Global Step: 39670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:14,849-Speed 3479.82 samples/sec Loss 9.1637 LearningRate 0.0942 Epoch: 7 Global Step: 39680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:17,783-Speed 3490.56 samples/sec Loss 9.0246 LearningRate 0.0942 Epoch: 7 Global Step: 39690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:20,722-Speed 3485.73 samples/sec Loss 9.2036 LearningRate 0.0941 Epoch: 7 Global Step: 39700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:23,682-Speed 3460.47 samples/sec Loss 8.9739 LearningRate 0.0941 Epoch: 7 Global Step: 39710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:26,629-Speed 3475.35 samples/sec Loss 9.1583 LearningRate 0.0941 Epoch: 7 Global Step: 39720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:29,615-Speed 3430.50 samples/sec Loss 9.0631 LearningRate 0.0941 Epoch: 7 Global Step: 39730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:32,639-Speed 3387.02 samples/sec Loss 9.1164 LearningRate 0.0940 Epoch: 7 Global Step: 39740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:35,595-Speed 3465.16 samples/sec Loss 9.0724 LearningRate 0.0940 Epoch: 7 Global Step: 39750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:38,548-Speed 3468.95 samples/sec Loss 9.1794 LearningRate 0.0940 Epoch: 7 Global Step: 39760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:41,564-Speed 3395.38 samples/sec Loss 9.2514 LearningRate 0.0940 Epoch: 7 Global Step: 39770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:44,633-Speed 3338.63 samples/sec Loss 9.1095 LearningRate 0.0940 Epoch: 7 Global Step: 39780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:47,623-Speed 3425.04 samples/sec Loss 9.1013 LearningRate 0.0939 Epoch: 7 Global Step: 39790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:50,563-Speed 3484.00 samples/sec Loss 9.2630 LearningRate 0.0939 Epoch: 7 Global Step: 39800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:53,597-Speed 3376.00 samples/sec Loss 9.1271 LearningRate 0.0939 Epoch: 7 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:56,602-Speed 3407.71 samples/sec Loss 9.2181 LearningRate 0.0939 Epoch: 7 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:02:59,602-Speed 3414.78 samples/sec Loss 9.0233 LearningRate 0.0938 Epoch: 7 Global Step: 39830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:03:02,531-Speed 3496.97 samples/sec Loss 9.2131 LearningRate 0.0938 Epoch: 7 Global Step: 39840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:03:05,481-Speed 3473.28 samples/sec Loss 8.9737 LearningRate 0.0938 Epoch: 7 Global Step: 39850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:03:08,557-Speed 3328.87 samples/sec Loss 9.1806 LearningRate 0.0938 Epoch: 7 Global Step: 39860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:03:11,558-Speed 3413.72 samples/sec Loss 9.0778 LearningRate 0.0938 Epoch: 7 Global Step: 39870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:03:14,512-Speed 3467.37 samples/sec Loss 9.1592 LearningRate 0.0937 Epoch: 7 Global Step: 39880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:03:17,469-Speed 3463.37 samples/sec Loss 9.1842 LearningRate 0.0937 Epoch: 7 Global Step: 39890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:03:20,439-Speed 3449.45 samples/sec Loss 9.0675 LearningRate 0.0937 Epoch: 7 Global Step: 39900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:03:23,409-Speed 3447.62 samples/sec Loss 8.9154 LearningRate 0.0937 Epoch: 7 Global Step: 39910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:03:26,393-Speed 3432.86 samples/sec Loss 8.9402 LearningRate 0.0937 Epoch: 7 Global Step: 39920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:03:29,360-Speed 3453.36 samples/sec Loss 9.2566 LearningRate 0.0936 Epoch: 7 Global Step: 39930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:03:32,315-Speed 3465.46 samples/sec Loss 9.0671 LearningRate 0.0936 Epoch: 7 Global Step: 39940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:03:35,390-Speed 3330.73 samples/sec Loss 9.2595 LearningRate 0.0936 Epoch: 7 Global Step: 39950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:03:38,396-Speed 3407.44 samples/sec Loss 9.1618 LearningRate 0.0936 Epoch: 7 Global Step: 39960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:03:41,380-Speed 3432.79 samples/sec Loss 9.0142 LearningRate 0.0935 Epoch: 7 Global Step: 39970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:03:44,316-Speed 3488.85 samples/sec Loss 9.0103 LearningRate 0.0935 Epoch: 7 Global Step: 39980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:03:47,256-Speed 3483.26 samples/sec Loss 9.2060 LearningRate 0.0935 Epoch: 7 Global Step: 39990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:03:50,195-Speed 3486.03 samples/sec Loss 9.0830 LearningRate 0.0935 Epoch: 7 Global Step: 40000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:04:33,395-[lfw][40000]XNorm: 23.252768 Training: 2022-01-19 21:04:33,395-[lfw][40000]Accuracy-Flip: 0.99700+-0.00277 Training: 2022-01-19 21:04:33,396-[lfw][40000]Accuracy-Highest: 0.99700 Training: 2022-01-19 21:05:24,894-[cfp_fp][40000]XNorm: 20.935109 Training: 2022-01-19 21:05:24,895-[cfp_fp][40000]Accuracy-Flip: 0.96214+-0.01127 Training: 2022-01-19 21:05:24,895-[cfp_fp][40000]Accuracy-Highest: 0.96214 Training: 2022-01-19 21:06:08,234-[agedb_30][40000]XNorm: 22.902938 Training: 2022-01-19 21:06:08,235-[agedb_30][40000]Accuracy-Flip: 0.97117+-0.00931 Training: 2022-01-19 21:06:08,235-[agedb_30][40000]Accuracy-Highest: 0.97117 Training: 2022-01-19 21:06:11,222-Speed 72.61 samples/sec Loss 9.1244 LearningRate 0.0935 Epoch: 7 Global Step: 40010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:06:14,184-Speed 3458.13 samples/sec Loss 9.0445 LearningRate 0.0934 Epoch: 7 Global Step: 40020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:06:17,246-Speed 3345.44 samples/sec Loss 9.0971 LearningRate 0.0934 Epoch: 7 Global Step: 40030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:06:20,222-Speed 3442.57 samples/sec Loss 9.0468 LearningRate 0.0934 Epoch: 7 Global Step: 40040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:06:23,149-Speed 3498.68 samples/sec Loss 9.1194 LearningRate 0.0934 Epoch: 7 Global Step: 40050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:06:26,150-Speed 3413.41 samples/sec Loss 9.0328 LearningRate 0.0934 Epoch: 7 Global Step: 40060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:06:29,078-Speed 3498.77 samples/sec Loss 9.0095 LearningRate 0.0933 Epoch: 7 Global Step: 40070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:06:32,046-Speed 3449.81 samples/sec Loss 9.0612 LearningRate 0.0933 Epoch: 7 Global Step: 40080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:06:34,981-Speed 3490.45 samples/sec Loss 9.1062 LearningRate 0.0933 Epoch: 7 Global Step: 40090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:06:37,914-Speed 3492.38 samples/sec Loss 8.9989 LearningRate 0.0933 Epoch: 7 Global Step: 40100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:06:40,895-Speed 3436.57 samples/sec Loss 9.1139 LearningRate 0.0932 Epoch: 7 Global Step: 40110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:06:43,837-Speed 3480.53 samples/sec Loss 9.2230 LearningRate 0.0932 Epoch: 7 Global Step: 40120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:06:46,783-Speed 3477.29 samples/sec Loss 9.1853 LearningRate 0.0932 Epoch: 7 Global Step: 40130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:06:49,753-Speed 3449.29 samples/sec Loss 9.0989 LearningRate 0.0932 Epoch: 7 Global Step: 40140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:06:52,698-Speed 3478.56 samples/sec Loss 8.9146 LearningRate 0.0932 Epoch: 7 Global Step: 40150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:06:55,682-Speed 3432.43 samples/sec Loss 9.1838 LearningRate 0.0931 Epoch: 7 Global Step: 40160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:06:58,624-Speed 3481.65 samples/sec Loss 9.1452 LearningRate 0.0931 Epoch: 7 Global Step: 40170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:01,628-Speed 3410.13 samples/sec Loss 9.0491 LearningRate 0.0931 Epoch: 7 Global Step: 40180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:04,623-Speed 3419.94 samples/sec Loss 8.9045 LearningRate 0.0931 Epoch: 7 Global Step: 40190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:07,569-Speed 3476.54 samples/sec Loss 8.9728 LearningRate 0.0930 Epoch: 7 Global Step: 40200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:10,559-Speed 3426.22 samples/sec Loss 9.0411 LearningRate 0.0930 Epoch: 7 Global Step: 40210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:13,500-Speed 3481.89 samples/sec Loss 9.0501 LearningRate 0.0930 Epoch: 7 Global Step: 40220 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:07:16,527-Speed 3384.59 samples/sec Loss 8.9670 LearningRate 0.0930 Epoch: 7 Global Step: 40230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:19,505-Speed 3439.76 samples/sec Loss 9.0211 LearningRate 0.0930 Epoch: 7 Global Step: 40240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:22,494-Speed 3426.19 samples/sec Loss 9.0646 LearningRate 0.0929 Epoch: 7 Global Step: 40250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:25,443-Speed 3473.34 samples/sec Loss 9.1264 LearningRate 0.0929 Epoch: 7 Global Step: 40260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:28,390-Speed 3475.51 samples/sec Loss 9.0770 LearningRate 0.0929 Epoch: 7 Global Step: 40270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:31,345-Speed 3465.67 samples/sec Loss 9.0984 LearningRate 0.0929 Epoch: 7 Global Step: 40280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:34,288-Speed 3480.76 samples/sec Loss 8.9713 LearningRate 0.0929 Epoch: 7 Global Step: 40290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:37,245-Speed 3463.34 samples/sec Loss 8.9815 LearningRate 0.0928 Epoch: 7 Global Step: 40300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:40,219-Speed 3444.20 samples/sec Loss 9.0371 LearningRate 0.0928 Epoch: 7 Global Step: 40310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:43,276-Speed 3351.65 samples/sec Loss 9.0724 LearningRate 0.0928 Epoch: 7 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:46,274-Speed 3416.56 samples/sec Loss 8.9631 LearningRate 0.0928 Epoch: 7 Global Step: 40330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:49,285-Speed 3402.04 samples/sec Loss 9.0707 LearningRate 0.0927 Epoch: 7 Global Step: 40340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:52,236-Speed 3470.10 samples/sec Loss 9.1922 LearningRate 0.0927 Epoch: 7 Global Step: 40350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:55,175-Speed 3487.67 samples/sec Loss 8.8715 LearningRate 0.0927 Epoch: 7 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:07:58,138-Speed 3456.72 samples/sec Loss 9.0304 LearningRate 0.0927 Epoch: 7 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:01,071-Speed 3492.29 samples/sec Loss 8.9777 LearningRate 0.0927 Epoch: 7 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:04,036-Speed 3454.31 samples/sec Loss 9.0335 LearningRate 0.0926 Epoch: 7 Global Step: 40390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:06,983-Speed 3475.68 samples/sec Loss 8.8128 LearningRate 0.0926 Epoch: 7 Global Step: 40400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:09,985-Speed 3412.17 samples/sec Loss 9.2950 LearningRate 0.0926 Epoch: 7 Global Step: 40410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:12,919-Speed 3491.76 samples/sec Loss 8.9787 LearningRate 0.0926 Epoch: 7 Global Step: 40420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:15,896-Speed 3440.20 samples/sec Loss 9.0628 LearningRate 0.0926 Epoch: 7 Global Step: 40430 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:08:18,824-Speed 3498.77 samples/sec Loss 8.9896 LearningRate 0.0925 Epoch: 7 Global Step: 40440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:21,777-Speed 3467.81 samples/sec Loss 8.8408 LearningRate 0.0925 Epoch: 7 Global Step: 40450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:24,860-Speed 3322.55 samples/sec Loss 9.0725 LearningRate 0.0925 Epoch: 7 Global Step: 40460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:36,812-Speed 856.86 samples/sec Loss 8.5696 LearningRate 0.0925 Epoch: 8 Global Step: 40470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:39,960-Speed 3319.38 samples/sec Loss 8.2759 LearningRate 0.0924 Epoch: 8 Global Step: 40480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:42,988-Speed 3382.13 samples/sec Loss 8.2476 LearningRate 0.0924 Epoch: 8 Global Step: 40490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:45,998-Speed 3463.93 samples/sec Loss 8.2751 LearningRate 0.0924 Epoch: 8 Global Step: 40500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:48,951-Speed 3468.57 samples/sec Loss 8.1844 LearningRate 0.0924 Epoch: 8 Global Step: 40510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:51,931-Speed 3436.41 samples/sec Loss 8.1223 LearningRate 0.0924 Epoch: 8 Global Step: 40520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:54,882-Speed 3471.35 samples/sec Loss 8.3445 LearningRate 0.0923 Epoch: 8 Global Step: 40530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:08:57,836-Speed 3466.34 samples/sec Loss 8.3122 LearningRate 0.0923 Epoch: 8 Global Step: 40540 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:09:00,762-Speed 3502.19 samples/sec Loss 8.2892 LearningRate 0.0923 Epoch: 8 Global Step: 40550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:03,693-Speed 3493.57 samples/sec Loss 8.3374 LearningRate 0.0923 Epoch: 8 Global Step: 40560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:06,641-Speed 3474.79 samples/sec Loss 8.4664 LearningRate 0.0923 Epoch: 8 Global Step: 40570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:09,621-Speed 3437.11 samples/sec Loss 8.3437 LearningRate 0.0922 Epoch: 8 Global Step: 40580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:12,592-Speed 3447.86 samples/sec Loss 8.4967 LearningRate 0.0922 Epoch: 8 Global Step: 40590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:15,613-Speed 3390.97 samples/sec Loss 8.4775 LearningRate 0.0922 Epoch: 8 Global Step: 40600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:18,605-Speed 3422.78 samples/sec Loss 8.4264 LearningRate 0.0922 Epoch: 8 Global Step: 40610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:21,548-Speed 3480.16 samples/sec Loss 8.6452 LearningRate 0.0921 Epoch: 8 Global Step: 40620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:24,505-Speed 3464.05 samples/sec Loss 8.4805 LearningRate 0.0921 Epoch: 8 Global Step: 40630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:27,433-Speed 3498.25 samples/sec Loss 8.4787 LearningRate 0.0921 Epoch: 8 Global Step: 40640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:30,366-Speed 3492.87 samples/sec Loss 8.5877 LearningRate 0.0921 Epoch: 8 Global Step: 40650 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:09:33,347-Speed 3435.68 samples/sec Loss 8.5557 LearningRate 0.0921 Epoch: 8 Global Step: 40660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:36,342-Speed 3420.17 samples/sec Loss 8.6629 LearningRate 0.0920 Epoch: 8 Global Step: 40670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:39,305-Speed 3457.82 samples/sec Loss 8.6519 LearningRate 0.0920 Epoch: 8 Global Step: 40680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:42,292-Speed 3428.27 samples/sec Loss 8.4194 LearningRate 0.0920 Epoch: 8 Global Step: 40690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:45,376-Speed 3322.34 samples/sec Loss 8.7425 LearningRate 0.0920 Epoch: 8 Global Step: 40700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:48,305-Speed 3496.16 samples/sec Loss 8.8109 LearningRate 0.0920 Epoch: 8 Global Step: 40710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:51,245-Speed 3485.61 samples/sec Loss 8.5262 LearningRate 0.0919 Epoch: 8 Global Step: 40720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:54,183-Speed 3485.75 samples/sec Loss 8.4982 LearningRate 0.0919 Epoch: 8 Global Step: 40730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:09:57,201-Speed 3393.50 samples/sec Loss 8.4582 LearningRate 0.0919 Epoch: 8 Global Step: 40740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:00,242-Speed 3369.40 samples/sec Loss 8.5192 LearningRate 0.0919 Epoch: 8 Global Step: 40750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:03,171-Speed 3496.90 samples/sec Loss 8.6512 LearningRate 0.0918 Epoch: 8 Global Step: 40760 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:10:06,106-Speed 3489.93 samples/sec Loss 8.7513 LearningRate 0.0918 Epoch: 8 Global Step: 40770 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:10:09,109-Speed 3410.74 samples/sec Loss 8.4956 LearningRate 0.0918 Epoch: 8 Global Step: 40780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:12,084-Speed 3442.87 samples/sec Loss 8.6228 LearningRate 0.0918 Epoch: 8 Global Step: 40790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:15,031-Speed 3475.36 samples/sec Loss 8.5765 LearningRate 0.0918 Epoch: 8 Global Step: 40800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:17,970-Speed 3485.58 samples/sec Loss 8.5146 LearningRate 0.0917 Epoch: 8 Global Step: 40810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:20,909-Speed 3485.35 samples/sec Loss 8.6440 LearningRate 0.0917 Epoch: 8 Global Step: 40820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:23,896-Speed 3429.21 samples/sec Loss 8.6401 LearningRate 0.0917 Epoch: 8 Global Step: 40830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:26,843-Speed 3474.62 samples/sec Loss 8.7946 LearningRate 0.0917 Epoch: 8 Global Step: 40840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:29,814-Speed 3448.61 samples/sec Loss 8.6449 LearningRate 0.0917 Epoch: 8 Global Step: 40850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:32,774-Speed 3459.84 samples/sec Loss 8.5904 LearningRate 0.0916 Epoch: 8 Global Step: 40860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:35,725-Speed 3471.56 samples/sec Loss 8.7160 LearningRate 0.0916 Epoch: 8 Global Step: 40870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:38,644-Speed 3508.35 samples/sec Loss 8.7854 LearningRate 0.0916 Epoch: 8 Global Step: 40880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:41,594-Speed 3472.54 samples/sec Loss 8.5472 LearningRate 0.0916 Epoch: 8 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:44,521-Speed 3499.67 samples/sec Loss 8.8000 LearningRate 0.0915 Epoch: 8 Global Step: 40900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:47,579-Speed 3349.06 samples/sec Loss 8.8512 LearningRate 0.0915 Epoch: 8 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:50,605-Speed 3385.52 samples/sec Loss 8.6272 LearningRate 0.0915 Epoch: 8 Global Step: 40920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:53,549-Speed 3478.40 samples/sec Loss 8.7723 LearningRate 0.0915 Epoch: 8 Global Step: 40930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:56,510-Speed 3460.16 samples/sec Loss 8.8314 LearningRate 0.0915 Epoch: 8 Global Step: 40940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:10:59,462-Speed 3470.46 samples/sec Loss 8.7412 LearningRate 0.0914 Epoch: 8 Global Step: 40950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:02,400-Speed 3485.57 samples/sec Loss 8.7572 LearningRate 0.0914 Epoch: 8 Global Step: 40960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:05,333-Speed 3492.73 samples/sec Loss 8.6762 LearningRate 0.0914 Epoch: 8 Global Step: 40970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:08,271-Speed 3486.15 samples/sec Loss 8.8172 LearningRate 0.0914 Epoch: 8 Global Step: 40980 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:11:11,200-Speed 3497.49 samples/sec Loss 8.8634 LearningRate 0.0914 Epoch: 8 Global Step: 40990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:14,240-Speed 3369.08 samples/sec Loss 8.4723 LearningRate 0.0913 Epoch: 8 Global Step: 41000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:17,181-Speed 3482.23 samples/sec Loss 8.8833 LearningRate 0.0913 Epoch: 8 Global Step: 41010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:20,219-Speed 3372.09 samples/sec Loss 8.7320 LearningRate 0.0913 Epoch: 8 Global Step: 41020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:23,200-Speed 3436.02 samples/sec Loss 8.5752 LearningRate 0.0913 Epoch: 8 Global Step: 41030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:26,136-Speed 3489.18 samples/sec Loss 8.6776 LearningRate 0.0912 Epoch: 8 Global Step: 41040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:29,101-Speed 3454.49 samples/sec Loss 8.5741 LearningRate 0.0912 Epoch: 8 Global Step: 41050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:32,069-Speed 3449.87 samples/sec Loss 8.7773 LearningRate 0.0912 Epoch: 8 Global Step: 41060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:35,014-Speed 3478.57 samples/sec Loss 8.7233 LearningRate 0.0912 Epoch: 8 Global Step: 41070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:37,948-Speed 3491.19 samples/sec Loss 8.6965 LearningRate 0.0912 Epoch: 8 Global Step: 41080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:11:40,879-Speed 3494.65 samples/sec Loss 8.9736 LearningRate 0.0911 Epoch: 8 Global Step: 41090 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:11:43,807-Speed 3498.49 samples/sec Loss 8.7644 LearningRate 0.0911 Epoch: 8 Global Step: 41100 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:11:46,713-Speed 3523.79 samples/sec Loss 8.9261 LearningRate 0.0911 Epoch: 8 Global Step: 41110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:11:49,660-Speed 3475.44 samples/sec Loss 8.6248 LearningRate 0.0911 Epoch: 8 Global Step: 41120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:11:52,604-Speed 3479.48 samples/sec Loss 8.8467 LearningRate 0.0911 Epoch: 8 Global Step: 41130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:11:55,559-Speed 3466.34 samples/sec Loss 8.8483 LearningRate 0.0910 Epoch: 8 Global Step: 41140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:11:58,505-Speed 3477.42 samples/sec Loss 8.6610 LearningRate 0.0910 Epoch: 8 Global Step: 41150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:12:01,457-Speed 3469.26 samples/sec Loss 8.6374 LearningRate 0.0910 Epoch: 8 Global Step: 41160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:12:04,424-Speed 3451.74 samples/sec Loss 8.6332 LearningRate 0.0910 Epoch: 8 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:12:07,424-Speed 3414.04 samples/sec Loss 8.8650 LearningRate 0.0909 Epoch: 8 Global Step: 41180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:12:10,405-Speed 3436.53 samples/sec Loss 8.8246 LearningRate 0.0909 Epoch: 8 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:12:13,360-Speed 3466.49 samples/sec Loss 8.9635 LearningRate 0.0909 Epoch: 8 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:12:16,296-Speed 3489.15 samples/sec Loss 8.7524 LearningRate 0.0909 Epoch: 8 Global Step: 41210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:12:19,230-Speed 3491.01 samples/sec Loss 8.7692 LearningRate 0.0909 Epoch: 8 Global Step: 41220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:12:22,168-Speed 3486.42 samples/sec Loss 8.7544 LearningRate 0.0908 Epoch: 8 Global Step: 41230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:12:25,109-Speed 3481.97 samples/sec Loss 8.7925 LearningRate 0.0908 Epoch: 8 Global Step: 41240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:12:28,100-Speed 3424.78 samples/sec Loss 8.7654 LearningRate 0.0908 Epoch: 8 Global Step: 41250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:12:31,112-Speed 3400.34 samples/sec Loss 8.9407 LearningRate 0.0908 Epoch: 8 Global Step: 41260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:12:34,080-Speed 3451.70 samples/sec Loss 8.7490 LearningRate 0.0908 Epoch: 8 Global Step: 41270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:12:37,040-Speed 3459.24 samples/sec Loss 8.8482 LearningRate 0.0907 Epoch: 8 Global Step: 41280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:12:39,981-Speed 3483.55 samples/sec Loss 8.8599 LearningRate 0.0907 Epoch: 8 Global Step: 41290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:12:42,920-Speed 3485.04 samples/sec Loss 8.9727 LearningRate 0.0907 Epoch: 8 Global Step: 41300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:12:45,913-Speed 3422.30 samples/sec Loss 8.8981 LearningRate 0.0907 Epoch: 8 Global Step: 41310 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:12:48,867-Speed 3467.57 samples/sec Loss 8.9152 LearningRate 0.0906 Epoch: 8 Global Step: 41320 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:12:51,892-Speed 3386.29 samples/sec Loss 8.8218 LearningRate 0.0906 Epoch: 8 Global Step: 41330 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:12:54,821-Speed 3497.08 samples/sec Loss 8.9517 LearningRate 0.0906 Epoch: 8 Global Step: 41340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:12:57,780-Speed 3461.80 samples/sec Loss 8.8938 LearningRate 0.0906 Epoch: 8 Global Step: 41350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:00,729-Speed 3473.07 samples/sec Loss 8.8629 LearningRate 0.0906 Epoch: 8 Global Step: 41360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:03,757-Speed 3382.74 samples/sec Loss 8.9330 LearningRate 0.0905 Epoch: 8 Global Step: 41370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:06,720-Speed 3456.68 samples/sec Loss 8.8365 LearningRate 0.0905 Epoch: 8 Global Step: 41380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:09,693-Speed 3445.48 samples/sec Loss 8.9892 LearningRate 0.0905 Epoch: 8 Global Step: 41390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:12,630-Speed 3488.21 samples/sec Loss 8.8525 LearningRate 0.0905 Epoch: 8 Global Step: 41400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:15,561-Speed 3494.47 samples/sec Loss 8.8633 LearningRate 0.0905 Epoch: 8 Global Step: 41410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:18,542-Speed 3435.80 samples/sec Loss 8.8401 LearningRate 0.0904 Epoch: 8 Global Step: 41420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:21,565-Speed 3388.56 samples/sec Loss 8.8586 LearningRate 0.0904 Epoch: 8 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:24,561-Speed 3418.09 samples/sec Loss 9.0140 LearningRate 0.0904 Epoch: 8 Global Step: 41440 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:13:27,546-Speed 3431.17 samples/sec Loss 8.9668 LearningRate 0.0904 Epoch: 8 Global Step: 41450 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:13:30,467-Speed 3508.32 samples/sec Loss 8.7945 LearningRate 0.0903 Epoch: 8 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:33,406-Speed 3484.50 samples/sec Loss 8.7721 LearningRate 0.0903 Epoch: 8 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:36,349-Speed 3480.01 samples/sec Loss 8.8079 LearningRate 0.0903 Epoch: 8 Global Step: 41480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:39,295-Speed 3477.37 samples/sec Loss 8.6437 LearningRate 0.0903 Epoch: 8 Global Step: 41490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:42,244-Speed 3473.87 samples/sec Loss 9.0552 LearningRate 0.0903 Epoch: 8 Global Step: 41500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:45,253-Speed 3404.85 samples/sec Loss 8.8526 LearningRate 0.0902 Epoch: 8 Global Step: 41510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:48,185-Speed 3493.90 samples/sec Loss 8.8741 LearningRate 0.0902 Epoch: 8 Global Step: 41520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:51,139-Speed 3467.46 samples/sec Loss 8.8123 LearningRate 0.0902 Epoch: 8 Global Step: 41530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:54,157-Speed 3392.89 samples/sec Loss 8.8655 LearningRate 0.0902 Epoch: 8 Global Step: 41540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:13:57,132-Speed 3443.58 samples/sec Loss 8.8085 LearningRate 0.0902 Epoch: 8 Global Step: 41550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:14:00,104-Speed 3446.41 samples/sec Loss 8.9048 LearningRate 0.0901 Epoch: 8 Global Step: 41560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:14:03,035-Speed 3495.07 samples/sec Loss 8.6873 LearningRate 0.0901 Epoch: 8 Global Step: 41570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:14:05,985-Speed 3472.31 samples/sec Loss 8.8215 LearningRate 0.0901 Epoch: 8 Global Step: 41580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:14:08,950-Speed 3454.46 samples/sec Loss 8.8366 LearningRate 0.0901 Epoch: 8 Global Step: 41590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:14:11,900-Speed 3472.47 samples/sec Loss 8.8180 LearningRate 0.0901 Epoch: 8 Global Step: 41600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:14:14,951-Speed 3356.48 samples/sec Loss 8.8212 LearningRate 0.0900 Epoch: 8 Global Step: 41610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:14:17,999-Speed 3360.15 samples/sec Loss 9.0003 LearningRate 0.0900 Epoch: 8 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:14:21,037-Speed 3372.32 samples/sec Loss 8.9555 LearningRate 0.0900 Epoch: 8 Global Step: 41630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:14:24,104-Speed 3339.82 samples/sec Loss 8.8337 LearningRate 0.0900 Epoch: 8 Global Step: 41640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:14:27,059-Speed 3465.97 samples/sec Loss 8.8255 LearningRate 0.0899 Epoch: 8 Global Step: 41650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:14:30,012-Speed 3468.35 samples/sec Loss 8.9167 LearningRate 0.0899 Epoch: 8 Global Step: 41660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:14:32,953-Speed 3482.15 samples/sec Loss 8.7396 LearningRate 0.0899 Epoch: 8 Global Step: 41670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:14:35,885-Speed 3494.29 samples/sec Loss 8.9176 LearningRate 0.0899 Epoch: 8 Global Step: 41680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:14:38,880-Speed 3419.73 samples/sec Loss 8.7126 LearningRate 0.0899 Epoch: 8 Global Step: 41690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:14:41,978-Speed 3306.44 samples/sec Loss 8.7977 LearningRate 0.0898 Epoch: 8 Global Step: 41700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:14:45,062-Speed 3321.77 samples/sec Loss 8.9281 LearningRate 0.0898 Epoch: 8 Global Step: 41710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:14:48,006-Speed 3478.76 samples/sec Loss 8.9888 LearningRate 0.0898 Epoch: 8 Global Step: 41720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:14:50,975-Speed 3449.71 samples/sec Loss 8.8293 LearningRate 0.0898 Epoch: 8 Global Step: 41730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:14:53,946-Speed 3447.84 samples/sec Loss 8.9659 LearningRate 0.0898 Epoch: 8 Global Step: 41740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:14:56,921-Speed 3443.21 samples/sec Loss 8.9235 LearningRate 0.0897 Epoch: 8 Global Step: 41750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:14:59,870-Speed 3473.38 samples/sec Loss 8.7365 LearningRate 0.0897 Epoch: 8 Global Step: 41760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:02,807-Speed 3487.27 samples/sec Loss 9.2058 LearningRate 0.0897 Epoch: 8 Global Step: 41770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:05,777-Speed 3448.47 samples/sec Loss 8.8393 LearningRate 0.0897 Epoch: 8 Global Step: 41780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:08,736-Speed 3461.60 samples/sec Loss 9.0220 LearningRate 0.0896 Epoch: 8 Global Step: 41790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:11,738-Speed 3412.11 samples/sec Loss 8.8681 LearningRate 0.0896 Epoch: 8 Global Step: 41800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:14,736-Speed 3416.54 samples/sec Loss 8.8370 LearningRate 0.0896 Epoch: 8 Global Step: 41810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:17,674-Speed 3486.64 samples/sec Loss 8.8036 LearningRate 0.0896 Epoch: 8 Global Step: 41820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:20,613-Speed 3484.55 samples/sec Loss 8.8777 LearningRate 0.0896 Epoch: 8 Global Step: 41830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:23,558-Speed 3477.60 samples/sec Loss 8.9672 LearningRate 0.0895 Epoch: 8 Global Step: 41840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:26,500-Speed 3482.10 samples/sec Loss 8.7561 LearningRate 0.0895 Epoch: 8 Global Step: 41850 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:15:29,420-Speed 3508.40 samples/sec Loss 8.7908 LearningRate 0.0895 Epoch: 8 Global Step: 41860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:32,420-Speed 3413.54 samples/sec Loss 8.9148 LearningRate 0.0895 Epoch: 8 Global Step: 41870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:35,374-Speed 3467.74 samples/sec Loss 8.6510 LearningRate 0.0895 Epoch: 8 Global Step: 41880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:38,382-Speed 3405.54 samples/sec Loss 8.6946 LearningRate 0.0894 Epoch: 8 Global Step: 41890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:41,337-Speed 3465.60 samples/sec Loss 8.8409 LearningRate 0.0894 Epoch: 8 Global Step: 41900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:44,287-Speed 3472.08 samples/sec Loss 8.8154 LearningRate 0.0894 Epoch: 8 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:47,290-Speed 3411.17 samples/sec Loss 8.6926 LearningRate 0.0894 Epoch: 8 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:50,256-Speed 3453.59 samples/sec Loss 8.9232 LearningRate 0.0894 Epoch: 8 Global Step: 41930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:53,201-Speed 3478.29 samples/sec Loss 8.9159 LearningRate 0.0893 Epoch: 8 Global Step: 41940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:56,133-Speed 3493.34 samples/sec Loss 8.7936 LearningRate 0.0893 Epoch: 8 Global Step: 41950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:15:59,073-Speed 3484.21 samples/sec Loss 8.9329 LearningRate 0.0893 Epoch: 8 Global Step: 41960 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-19 21:16:02,056-Speed 3434.74 samples/sec Loss 8.9595 LearningRate 0.0893 Epoch: 8 Global Step: 41970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:16:05,082-Speed 3383.98 samples/sec Loss 9.0003 LearningRate 0.0892 Epoch: 8 Global Step: 41980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:16:08,037-Speed 3466.75 samples/sec Loss 8.9699 LearningRate 0.0892 Epoch: 8 Global Step: 41990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:16:10,991-Speed 3467.65 samples/sec Loss 8.8691 LearningRate 0.0892 Epoch: 8 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:16:53,913-[lfw][42000]XNorm: 22.226234 Training: 2022-01-19 21:16:53,913-[lfw][42000]Accuracy-Flip: 0.99617+-0.00325 Training: 2022-01-19 21:16:53,914-[lfw][42000]Accuracy-Highest: 0.99700 Training: 2022-01-19 21:17:43,657-[cfp_fp][42000]XNorm: 18.797650 Training: 2022-01-19 21:17:43,657-[cfp_fp][42000]Accuracy-Flip: 0.95986+-0.00832 Training: 2022-01-19 21:17:43,658-[cfp_fp][42000]Accuracy-Highest: 0.96214 Training: 2022-01-19 21:18:26,519-[agedb_30][42000]XNorm: 21.907798 Training: 2022-01-19 21:18:26,520-[agedb_30][42000]Accuracy-Flip: 0.97300+-0.00767 Training: 2022-01-19 21:18:26,521-[agedb_30][42000]Accuracy-Highest: 0.97300 Training: 2022-01-19 21:18:29,454-Speed 73.95 samples/sec Loss 8.7558 LearningRate 0.0892 Epoch: 8 Global Step: 42010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:18:32,429-Speed 3443.65 samples/sec Loss 8.8570 LearningRate 0.0892 Epoch: 8 Global Step: 42020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:18:35,427-Speed 3416.16 samples/sec Loss 8.8007 LearningRate 0.0891 Epoch: 8 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:18:38,350-Speed 3504.18 samples/sec Loss 8.7721 LearningRate 0.0891 Epoch: 8 Global Step: 42040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:18:41,283-Speed 3493.01 samples/sec Loss 8.9390 LearningRate 0.0891 Epoch: 8 Global Step: 42050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:18:44,223-Speed 3483.89 samples/sec Loss 8.8137 LearningRate 0.0891 Epoch: 8 Global Step: 42060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:18:47,153-Speed 3495.72 samples/sec Loss 8.9046 LearningRate 0.0891 Epoch: 8 Global Step: 42070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:18:50,120-Speed 3451.26 samples/sec Loss 8.9159 LearningRate 0.0890 Epoch: 8 Global Step: 42080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:18:53,070-Speed 3473.58 samples/sec Loss 8.8630 LearningRate 0.0890 Epoch: 8 Global Step: 42090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:18:56,025-Speed 3465.80 samples/sec Loss 8.6705 LearningRate 0.0890 Epoch: 8 Global Step: 42100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:18:59,054-Speed 3381.36 samples/sec Loss 8.9178 LearningRate 0.0890 Epoch: 8 Global Step: 42110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:02,020-Speed 3453.49 samples/sec Loss 8.7910 LearningRate 0.0890 Epoch: 8 Global Step: 42120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:04,979-Speed 3461.56 samples/sec Loss 8.8679 LearningRate 0.0889 Epoch: 8 Global Step: 42130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:07,963-Speed 3433.28 samples/sec Loss 8.7815 LearningRate 0.0889 Epoch: 8 Global Step: 42140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:10,896-Speed 3492.20 samples/sec Loss 8.8062 LearningRate 0.0889 Epoch: 8 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:13,856-Speed 3460.98 samples/sec Loss 8.8445 LearningRate 0.0889 Epoch: 8 Global Step: 42160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:16,860-Speed 3408.93 samples/sec Loss 8.5594 LearningRate 0.0888 Epoch: 8 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:19,827-Speed 3451.63 samples/sec Loss 8.6765 LearningRate 0.0888 Epoch: 8 Global Step: 42180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:22,791-Speed 3456.24 samples/sec Loss 8.8347 LearningRate 0.0888 Epoch: 8 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:25,727-Speed 3488.86 samples/sec Loss 8.6999 LearningRate 0.0888 Epoch: 8 Global Step: 42200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:28,736-Speed 3405.20 samples/sec Loss 8.8449 LearningRate 0.0888 Epoch: 8 Global Step: 42210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:31,706-Speed 3448.09 samples/sec Loss 8.7659 LearningRate 0.0887 Epoch: 8 Global Step: 42220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:34,643-Speed 3487.54 samples/sec Loss 8.9276 LearningRate 0.0887 Epoch: 8 Global Step: 42230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:37,587-Speed 3479.63 samples/sec Loss 8.7493 LearningRate 0.0887 Epoch: 8 Global Step: 42240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:19:40,520-Speed 3492.00 samples/sec Loss 8.8775 LearningRate 0.0887 Epoch: 8 Global Step: 42250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:19:43,453-Speed 3492.24 samples/sec Loss 8.8201 LearningRate 0.0887 Epoch: 8 Global Step: 42260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:19:46,393-Speed 3483.66 samples/sec Loss 8.8724 LearningRate 0.0886 Epoch: 8 Global Step: 42270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:19:49,372-Speed 3438.82 samples/sec Loss 8.7410 LearningRate 0.0886 Epoch: 8 Global Step: 42280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:19:52,313-Speed 3482.28 samples/sec Loss 8.8333 LearningRate 0.0886 Epoch: 8 Global Step: 42290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:19:55,249-Speed 3489.19 samples/sec Loss 8.7888 LearningRate 0.0886 Epoch: 8 Global Step: 42300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:19:58,187-Speed 3486.57 samples/sec Loss 8.8939 LearningRate 0.0886 Epoch: 8 Global Step: 42310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:01,119-Speed 3493.54 samples/sec Loss 8.7102 LearningRate 0.0885 Epoch: 8 Global Step: 42320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:04,056-Speed 3487.29 samples/sec Loss 8.6772 LearningRate 0.0885 Epoch: 8 Global Step: 42330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:07,015-Speed 3461.52 samples/sec Loss 8.6708 LearningRate 0.0885 Epoch: 8 Global Step: 42340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:09,946-Speed 3494.49 samples/sec Loss 8.7692 LearningRate 0.0885 Epoch: 8 Global Step: 42350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:20:12,883-Speed 3487.40 samples/sec Loss 8.9325 LearningRate 0.0884 Epoch: 8 Global Step: 42360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:20:15,817-Speed 3491.23 samples/sec Loss 8.9691 LearningRate 0.0884 Epoch: 8 Global Step: 42370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:20:18,783-Speed 3453.17 samples/sec Loss 8.7472 LearningRate 0.0884 Epoch: 8 Global Step: 42380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:20:21,766-Speed 3434.34 samples/sec Loss 8.7922 LearningRate 0.0884 Epoch: 8 Global Step: 42390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:20:24,740-Speed 3444.22 samples/sec Loss 8.7896 LearningRate 0.0884 Epoch: 8 Global Step: 42400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:20:27,666-Speed 3500.98 samples/sec Loss 8.8969 LearningRate 0.0883 Epoch: 8 Global Step: 42410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:30,764-Speed 3306.41 samples/sec Loss 8.7415 LearningRate 0.0883 Epoch: 8 Global Step: 42420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:33,719-Speed 3465.81 samples/sec Loss 8.9263 LearningRate 0.0883 Epoch: 8 Global Step: 42430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:36,660-Speed 3482.86 samples/sec Loss 8.9117 LearningRate 0.0883 Epoch: 8 Global Step: 42440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:39,618-Speed 3462.47 samples/sec Loss 8.8468 LearningRate 0.0883 Epoch: 8 Global Step: 42450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:42,611-Speed 3422.22 samples/sec Loss 8.9495 LearningRate 0.0882 Epoch: 8 Global Step: 42460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:45,586-Speed 3442.63 samples/sec Loss 8.6566 LearningRate 0.0882 Epoch: 8 Global Step: 42470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:48,536-Speed 3472.07 samples/sec Loss 8.9235 LearningRate 0.0882 Epoch: 8 Global Step: 42480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:51,522-Speed 3430.59 samples/sec Loss 8.8270 LearningRate 0.0882 Epoch: 8 Global Step: 42490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:54,552-Speed 3380.46 samples/sec Loss 8.8158 LearningRate 0.0882 Epoch: 8 Global Step: 42500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-19 21:20:57,486-Speed 3491.67 samples/sec Loss 8.7278 LearningRate 0.0881 Epoch: 8 Global Step: 42510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:21:00,424-Speed 3485.53 samples/sec Loss 8.7892 LearningRate 0.0881 Epoch: 8 Global Step: 42520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:21:03,413-Speed 3427.44 samples/sec Loss 8.9144 LearningRate 0.0881 Epoch: 8 Global Step: 42530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-19 21:21:06,392-Speed 3438.36 samples/sec Loss 8.7881 LearningRate 0.0881 Epoch: 8 Global Step: 42540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:09,318-Speed 3501.31 samples/sec Loss 8.8173 LearningRate 0.0880 Epoch: 8 Global Step: 42550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:12,256-Speed 3485.28 samples/sec Loss 8.7752 LearningRate 0.0880 Epoch: 8 Global Step: 42560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:15,198-Speed 3481.63 samples/sec Loss 8.7825 LearningRate 0.0880 Epoch: 8 Global Step: 42570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:18,240-Speed 3367.12 samples/sec Loss 8.8978 LearningRate 0.0880 Epoch: 8 Global Step: 42580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:21,180-Speed 3484.52 samples/sec Loss 8.5763 LearningRate 0.0880 Epoch: 8 Global Step: 42590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:24,148-Speed 3450.82 samples/sec Loss 8.5927 LearningRate 0.0879 Epoch: 8 Global Step: 42600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:27,085-Speed 3487.84 samples/sec Loss 9.0150 LearningRate 0.0879 Epoch: 8 Global Step: 42610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:21:30,074-Speed 3425.92 samples/sec Loss 8.6955 LearningRate 0.0879 Epoch: 8 Global Step: 42620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:33,004-Speed 3496.52 samples/sec Loss 8.7615 LearningRate 0.0879 Epoch: 8 Global Step: 42630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:35,953-Speed 3472.78 samples/sec Loss 8.6250 LearningRate 0.0879 Epoch: 8 Global Step: 42640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:38,879-Speed 3501.22 samples/sec Loss 8.6839 LearningRate 0.0878 Epoch: 8 Global Step: 42650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:41,836-Speed 3464.22 samples/sec Loss 8.6973 LearningRate 0.0878 Epoch: 8 Global Step: 42660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:21:44,758-Speed 3505.06 samples/sec Loss 8.8072 LearningRate 0.0878 Epoch: 8 Global Step: 42670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:21:47,704-Speed 3477.18 samples/sec Loss 8.8429 LearningRate 0.0878 Epoch: 8 Global Step: 42680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:21:50,637-Speed 3491.46 samples/sec Loss 8.7145 LearningRate 0.0878 Epoch: 8 Global Step: 42690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:21:53,567-Speed 3497.78 samples/sec Loss 8.6984 LearningRate 0.0877 Epoch: 8 Global Step: 42700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:21:56,625-Speed 3348.79 samples/sec Loss 8.7446 LearningRate 0.0877 Epoch: 8 Global Step: 42710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:21:59,601-Speed 3441.90 samples/sec Loss 8.8590 LearningRate 0.0877 Epoch: 8 Global Step: 42720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:02,579-Speed 3439.32 samples/sec Loss 8.7861 LearningRate 0.0877 Epoch: 8 Global Step: 42730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:05,537-Speed 3463.20 samples/sec Loss 8.8414 LearningRate 0.0876 Epoch: 8 Global Step: 42740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:08,587-Speed 3357.66 samples/sec Loss 8.9184 LearningRate 0.0876 Epoch: 8 Global Step: 42750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:11,666-Speed 3326.73 samples/sec Loss 8.6647 LearningRate 0.0876 Epoch: 8 Global Step: 42760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:14,701-Speed 3375.91 samples/sec Loss 8.8658 LearningRate 0.0876 Epoch: 8 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:22:17,765-Speed 3342.90 samples/sec Loss 8.7804 LearningRate 0.0876 Epoch: 8 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:22:20,799-Speed 3375.12 samples/sec Loss 8.7084 LearningRate 0.0875 Epoch: 8 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:22:23,864-Speed 3341.69 samples/sec Loss 8.7670 LearningRate 0.0875 Epoch: 8 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:22:26,876-Speed 3400.90 samples/sec Loss 8.7945 LearningRate 0.0875 Epoch: 8 Global Step: 42810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:22:29,826-Speed 3472.76 samples/sec Loss 8.7935 LearningRate 0.0875 Epoch: 8 Global Step: 42820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:22:32,818-Speed 3422.46 samples/sec Loss 8.7659 LearningRate 0.0875 Epoch: 8 Global Step: 42830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:22:35,751-Speed 3492.73 samples/sec Loss 8.6638 LearningRate 0.0874 Epoch: 8 Global Step: 42840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:22:38,689-Speed 3486.90 samples/sec Loss 8.8790 LearningRate 0.0874 Epoch: 8 Global Step: 42850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:41,701-Speed 3400.19 samples/sec Loss 8.8147 LearningRate 0.0874 Epoch: 8 Global Step: 42860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:44,641-Speed 3484.29 samples/sec Loss 8.8024 LearningRate 0.0874 Epoch: 8 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:47,567-Speed 3500.24 samples/sec Loss 8.8285 LearningRate 0.0874 Epoch: 8 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:50,501-Speed 3491.46 samples/sec Loss 8.8625 LearningRate 0.0873 Epoch: 8 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:53,432-Speed 3494.60 samples/sec Loss 8.6525 LearningRate 0.0873 Epoch: 8 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:56,423-Speed 3425.15 samples/sec Loss 8.6836 LearningRate 0.0873 Epoch: 8 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:22:59,375-Speed 3470.47 samples/sec Loss 8.8260 LearningRate 0.0873 Epoch: 8 Global Step: 42920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:23:02,327-Speed 3470.27 samples/sec Loss 8.8519 LearningRate 0.0873 Epoch: 8 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:23:05,297-Speed 3447.76 samples/sec Loss 8.7221 LearningRate 0.0872 Epoch: 8 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:23:08,240-Speed 3480.72 samples/sec Loss 8.6767 LearningRate 0.0872 Epoch: 8 Global Step: 42950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:11,173-Speed 3493.11 samples/sec Loss 8.7805 LearningRate 0.0872 Epoch: 8 Global Step: 42960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:14,114-Speed 3482.62 samples/sec Loss 8.7491 LearningRate 0.0872 Epoch: 8 Global Step: 42970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:17,073-Speed 3461.85 samples/sec Loss 8.7356 LearningRate 0.0871 Epoch: 8 Global Step: 42980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:20,009-Speed 3487.46 samples/sec Loss 8.7180 LearningRate 0.0871 Epoch: 8 Global Step: 42990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:22,954-Speed 3478.86 samples/sec Loss 8.7047 LearningRate 0.0871 Epoch: 8 Global Step: 43000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:25,902-Speed 3474.48 samples/sec Loss 8.6572 LearningRate 0.0871 Epoch: 8 Global Step: 43010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:28,844-Speed 3481.23 samples/sec Loss 8.7199 LearningRate 0.0871 Epoch: 8 Global Step: 43020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:31,903-Speed 3348.31 samples/sec Loss 8.7311 LearningRate 0.0870 Epoch: 8 Global Step: 43030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:34,851-Speed 3474.51 samples/sec Loss 8.7201 LearningRate 0.0870 Epoch: 8 Global Step: 43040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:37,791-Speed 3484.83 samples/sec Loss 8.7522 LearningRate 0.0870 Epoch: 8 Global Step: 43050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:23:40,724-Speed 3492.00 samples/sec Loss 8.5566 LearningRate 0.0870 Epoch: 8 Global Step: 43060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:23:43,657-Speed 3492.32 samples/sec Loss 8.6524 LearningRate 0.0870 Epoch: 8 Global Step: 43070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:46,589-Speed 3493.93 samples/sec Loss 8.4501 LearningRate 0.0869 Epoch: 8 Global Step: 43080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:49,536-Speed 3475.11 samples/sec Loss 8.8433 LearningRate 0.0869 Epoch: 8 Global Step: 43090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:52,541-Speed 3408.82 samples/sec Loss 8.8968 LearningRate 0.0869 Epoch: 8 Global Step: 43100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:55,594-Speed 3354.90 samples/sec Loss 8.6562 LearningRate 0.0869 Epoch: 8 Global Step: 43110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:23:58,532-Speed 3486.10 samples/sec Loss 8.6850 LearningRate 0.0869 Epoch: 8 Global Step: 43120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:24:01,575-Speed 3366.18 samples/sec Loss 8.5705 LearningRate 0.0868 Epoch: 8 Global Step: 43130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:24:04,498-Speed 3503.65 samples/sec Loss 8.7644 LearningRate 0.0868 Epoch: 8 Global Step: 43140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:24:07,429-Speed 3495.26 samples/sec Loss 8.7343 LearningRate 0.0868 Epoch: 8 Global Step: 43150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:24:10,385-Speed 3464.58 samples/sec Loss 8.7649 LearningRate 0.0868 Epoch: 8 Global Step: 43160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:24:13,343-Speed 3463.27 samples/sec Loss 8.9472 LearningRate 0.0868 Epoch: 8 Global Step: 43170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:24:16,385-Speed 3366.77 samples/sec Loss 8.7462 LearningRate 0.0867 Epoch: 8 Global Step: 43180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:24:19,328-Speed 3481.10 samples/sec Loss 8.8797 LearningRate 0.0867 Epoch: 8 Global Step: 43190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:24:22,298-Speed 3448.66 samples/sec Loss 8.6623 LearningRate 0.0867 Epoch: 8 Global Step: 43200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:24:25,356-Speed 3349.02 samples/sec Loss 8.6877 LearningRate 0.0867 Epoch: 8 Global Step: 43210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:24:28,314-Speed 3462.45 samples/sec Loss 8.6867 LearningRate 0.0866 Epoch: 8 Global Step: 43220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:24:31,263-Speed 3472.81 samples/sec Loss 8.7368 LearningRate 0.0866 Epoch: 8 Global Step: 43230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:24:34,259-Speed 3419.91 samples/sec Loss 8.6366 LearningRate 0.0866 Epoch: 8 Global Step: 43240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:24:37,279-Speed 3392.37 samples/sec Loss 8.9161 LearningRate 0.0866 Epoch: 8 Global Step: 43250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:24:40,237-Speed 3462.19 samples/sec Loss 8.6837 LearningRate 0.0866 Epoch: 8 Global Step: 43260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:24:43,202-Speed 3456.11 samples/sec Loss 8.8533 LearningRate 0.0865 Epoch: 8 Global Step: 43270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:24:46,140-Speed 3485.97 samples/sec Loss 8.5993 LearningRate 0.0865 Epoch: 8 Global Step: 43280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:24:49,083-Speed 3480.45 samples/sec Loss 8.7654 LearningRate 0.0865 Epoch: 8 Global Step: 43290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:24:52,018-Speed 3489.35 samples/sec Loss 8.6335 LearningRate 0.0865 Epoch: 8 Global Step: 43300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:24:54,954-Speed 3489.01 samples/sec Loss 8.8255 LearningRate 0.0865 Epoch: 8 Global Step: 43310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:24:57,907-Speed 3467.71 samples/sec Loss 8.7279 LearningRate 0.0864 Epoch: 8 Global Step: 43320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:25:00,859-Speed 3469.69 samples/sec Loss 8.5152 LearningRate 0.0864 Epoch: 8 Global Step: 43330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:25:03,808-Speed 3474.55 samples/sec Loss 8.8454 LearningRate 0.0864 Epoch: 8 Global Step: 43340 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:25:06,759-Speed 3470.65 samples/sec Loss 8.7175 LearningRate 0.0864 Epoch: 8 Global Step: 43350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:25:09,692-Speed 3492.79 samples/sec Loss 8.7974 LearningRate 0.0864 Epoch: 8 Global Step: 43360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:25:12,637-Speed 3477.11 samples/sec Loss 8.6584 LearningRate 0.0863 Epoch: 8 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:25:15,702-Speed 3341.77 samples/sec Loss 8.5659 LearningRate 0.0863 Epoch: 8 Global Step: 43380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:25:18,643-Speed 3482.91 samples/sec Loss 8.7296 LearningRate 0.0863 Epoch: 8 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:25:21,674-Speed 3379.53 samples/sec Loss 8.6468 LearningRate 0.0863 Epoch: 8 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:25:24,618-Speed 3478.83 samples/sec Loss 8.6405 LearningRate 0.0863 Epoch: 8 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:25:27,626-Speed 3405.02 samples/sec Loss 8.6462 LearningRate 0.0862 Epoch: 8 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:25:30,664-Speed 3371.76 samples/sec Loss 8.7507 LearningRate 0.0862 Epoch: 8 Global Step: 43430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:25:33,646-Speed 3435.18 samples/sec Loss 8.7292 LearningRate 0.0862 Epoch: 8 Global Step: 43440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:25:36,580-Speed 3491.74 samples/sec Loss 8.7672 LearningRate 0.0862 Epoch: 8 Global Step: 43450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:25:39,534-Speed 3467.17 samples/sec Loss 8.6421 LearningRate 0.0861 Epoch: 8 Global Step: 43460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:25:42,576-Speed 3366.67 samples/sec Loss 8.8244 LearningRate 0.0861 Epoch: 8 Global Step: 43470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:25:45,575-Speed 3415.32 samples/sec Loss 8.8520 LearningRate 0.0861 Epoch: 8 Global Step: 43480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:25:48,545-Speed 3448.68 samples/sec Loss 8.8094 LearningRate 0.0861 Epoch: 8 Global Step: 43490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:25:51,578-Speed 3377.61 samples/sec Loss 8.8105 LearningRate 0.0861 Epoch: 8 Global Step: 43500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:25:54,514-Speed 3488.04 samples/sec Loss 8.6803 LearningRate 0.0860 Epoch: 8 Global Step: 43510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:25:57,462-Speed 3475.38 samples/sec Loss 8.6110 LearningRate 0.0860 Epoch: 8 Global Step: 43520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:26:00,406-Speed 3478.50 samples/sec Loss 8.7035 LearningRate 0.0860 Epoch: 8 Global Step: 43530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:26:03,456-Speed 3359.11 samples/sec Loss 8.7244 LearningRate 0.0860 Epoch: 8 Global Step: 43540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:26:06,441-Speed 3430.68 samples/sec Loss 8.7758 LearningRate 0.0860 Epoch: 8 Global Step: 43550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:26:09,376-Speed 3490.63 samples/sec Loss 8.7764 LearningRate 0.0859 Epoch: 8 Global Step: 43560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:26:12,360-Speed 3431.76 samples/sec Loss 8.6790 LearningRate 0.0859 Epoch: 8 Global Step: 43570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:26:15,318-Speed 3464.19 samples/sec Loss 8.6849 LearningRate 0.0859 Epoch: 8 Global Step: 43580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:26:18,295-Speed 3440.20 samples/sec Loss 8.6824 LearningRate 0.0859 Epoch: 8 Global Step: 43590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:26:21,259-Speed 3455.52 samples/sec Loss 8.6693 LearningRate 0.0859 Epoch: 8 Global Step: 43600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:26:24,253-Speed 3421.83 samples/sec Loss 8.6270 LearningRate 0.0858 Epoch: 8 Global Step: 43610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:26:27,197-Speed 3478.91 samples/sec Loss 8.6154 LearningRate 0.0858 Epoch: 8 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:26:30,231-Speed 3375.69 samples/sec Loss 8.7082 LearningRate 0.0858 Epoch: 8 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:26:33,205-Speed 3443.81 samples/sec Loss 8.6855 LearningRate 0.0858 Epoch: 8 Global Step: 43640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:26:36,200-Speed 3420.41 samples/sec Loss 8.7255 LearningRate 0.0858 Epoch: 8 Global Step: 43650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:26:39,191-Speed 3425.12 samples/sec Loss 8.7434 LearningRate 0.0857 Epoch: 8 Global Step: 43660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:26:42,130-Speed 3484.66 samples/sec Loss 8.8677 LearningRate 0.0857 Epoch: 8 Global Step: 43670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:26:45,133-Speed 3410.93 samples/sec Loss 8.7553 LearningRate 0.0857 Epoch: 8 Global Step: 43680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:26:48,204-Speed 3334.66 samples/sec Loss 8.7585 LearningRate 0.0857 Epoch: 8 Global Step: 43690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:26:51,223-Speed 3392.88 samples/sec Loss 8.8065 LearningRate 0.0857 Epoch: 8 Global Step: 43700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:26:54,196-Speed 3445.74 samples/sec Loss 8.7669 LearningRate 0.0856 Epoch: 8 Global Step: 43710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:26:57,194-Speed 3416.46 samples/sec Loss 8.6980 LearningRate 0.0856 Epoch: 8 Global Step: 43720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:27:00,185-Speed 3424.81 samples/sec Loss 8.7439 LearningRate 0.0856 Epoch: 8 Global Step: 43730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:27:03,128-Speed 3479.83 samples/sec Loss 8.7773 LearningRate 0.0856 Epoch: 8 Global Step: 43740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:27:06,099-Speed 3447.93 samples/sec Loss 8.7863 LearningRate 0.0855 Epoch: 8 Global Step: 43750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:27:09,038-Speed 3485.36 samples/sec Loss 8.8837 LearningRate 0.0855 Epoch: 8 Global Step: 43760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:27:12,008-Speed 3448.74 samples/sec Loss 8.5396 LearningRate 0.0855 Epoch: 8 Global Step: 43770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:27:14,946-Speed 3486.39 samples/sec Loss 8.9015 LearningRate 0.0855 Epoch: 8 Global Step: 43780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:27:17,907-Speed 3458.57 samples/sec Loss 8.7280 LearningRate 0.0855 Epoch: 8 Global Step: 43790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:27:20,871-Speed 3456.55 samples/sec Loss 8.8015 LearningRate 0.0854 Epoch: 8 Global Step: 43800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:27:23,819-Speed 3473.77 samples/sec Loss 8.7548 LearningRate 0.0854 Epoch: 8 Global Step: 43810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:27:26,818-Speed 3415.87 samples/sec Loss 8.5872 LearningRate 0.0854 Epoch: 8 Global Step: 43820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:27:29,836-Speed 3393.73 samples/sec Loss 8.6178 LearningRate 0.0854 Epoch: 8 Global Step: 43830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:27:32,784-Speed 3474.04 samples/sec Loss 8.5762 LearningRate 0.0854 Epoch: 8 Global Step: 43840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:27:35,732-Speed 3475.07 samples/sec Loss 8.6807 LearningRate 0.0853 Epoch: 8 Global Step: 43850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:27:38,705-Speed 3445.13 samples/sec Loss 8.5449 LearningRate 0.0853 Epoch: 8 Global Step: 43860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:27:41,689-Speed 3432.60 samples/sec Loss 8.4528 LearningRate 0.0853 Epoch: 8 Global Step: 43870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:27:44,626-Speed 3487.97 samples/sec Loss 8.5355 LearningRate 0.0853 Epoch: 8 Global Step: 43880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:27:47,569-Speed 3480.13 samples/sec Loss 8.6577 LearningRate 0.0853 Epoch: 8 Global Step: 43890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:27:50,511-Speed 3481.78 samples/sec Loss 8.5293 LearningRate 0.0852 Epoch: 8 Global Step: 43900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:27:53,479-Speed 3451.32 samples/sec Loss 8.5924 LearningRate 0.0852 Epoch: 8 Global Step: 43910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:27:56,516-Speed 3372.35 samples/sec Loss 8.5551 LearningRate 0.0852 Epoch: 8 Global Step: 43920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:27:59,526-Speed 3404.15 samples/sec Loss 8.5400 LearningRate 0.0852 Epoch: 8 Global Step: 43930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:28:02,463-Speed 3487.91 samples/sec Loss 8.7041 LearningRate 0.0852 Epoch: 8 Global Step: 43940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:28:05,406-Speed 3480.08 samples/sec Loss 8.7696 LearningRate 0.0851 Epoch: 8 Global Step: 43950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:28:08,344-Speed 3485.61 samples/sec Loss 8.6668 LearningRate 0.0851 Epoch: 8 Global Step: 43960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:28:11,421-Speed 3328.77 samples/sec Loss 8.5853 LearningRate 0.0851 Epoch: 8 Global Step: 43970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:28:14,382-Speed 3459.19 samples/sec Loss 8.5869 LearningRate 0.0851 Epoch: 8 Global Step: 43980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:28:17,324-Speed 3482.32 samples/sec Loss 8.5200 LearningRate 0.0851 Epoch: 8 Global Step: 43990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:28:20,274-Speed 3471.52 samples/sec Loss 8.8162 LearningRate 0.0850 Epoch: 8 Global Step: 44000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:29:03,529-[lfw][44000]XNorm: 23.315155 Training: 2022-01-19 21:29:03,530-[lfw][44000]Accuracy-Flip: 0.99667+-0.00279 Training: 2022-01-19 21:29:03,530-[lfw][44000]Accuracy-Highest: 0.99700 Training: 2022-01-19 21:29:53,670-[cfp_fp][44000]XNorm: 20.170753 Training: 2022-01-19 21:29:53,670-[cfp_fp][44000]Accuracy-Flip: 0.96200+-0.00990 Training: 2022-01-19 21:29:53,671-[cfp_fp][44000]Accuracy-Highest: 0.96214 Training: 2022-01-19 21:30:36,858-[agedb_30][44000]XNorm: 22.982468 Training: 2022-01-19 21:30:36,859-[agedb_30][44000]Accuracy-Flip: 0.97350+-0.00647 Training: 2022-01-19 21:30:36,859-[agedb_30][44000]Accuracy-Highest: 0.97350 Training: 2022-01-19 21:30:39,794-Speed 73.40 samples/sec Loss 8.6741 LearningRate 0.0850 Epoch: 8 Global Step: 44010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:30:42,766-Speed 3446.13 samples/sec Loss 8.7566 LearningRate 0.0850 Epoch: 8 Global Step: 44020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:30:45,696-Speed 3495.87 samples/sec Loss 8.7905 LearningRate 0.0850 Epoch: 8 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:30:48,631-Speed 3490.55 samples/sec Loss 8.7939 LearningRate 0.0849 Epoch: 8 Global Step: 44040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:30:51,555-Speed 3503.77 samples/sec Loss 8.6545 LearningRate 0.0849 Epoch: 8 Global Step: 44050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:30:54,523-Speed 3450.53 samples/sec Loss 8.7496 LearningRate 0.0849 Epoch: 8 Global Step: 44060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:30:57,610-Speed 3318.07 samples/sec Loss 8.6012 LearningRate 0.0849 Epoch: 8 Global Step: 44070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:00,665-Speed 3351.98 samples/sec Loss 8.7112 LearningRate 0.0849 Epoch: 8 Global Step: 44080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:03,730-Speed 3342.06 samples/sec Loss 8.5894 LearningRate 0.0848 Epoch: 8 Global Step: 44090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:06,681-Speed 3471.47 samples/sec Loss 8.6113 LearningRate 0.0848 Epoch: 8 Global Step: 44100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:09,618-Speed 3492.34 samples/sec Loss 8.6736 LearningRate 0.0848 Epoch: 8 Global Step: 44110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:12,604-Speed 3430.23 samples/sec Loss 8.6503 LearningRate 0.0848 Epoch: 8 Global Step: 44120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:15,551-Speed 3475.03 samples/sec Loss 8.6266 LearningRate 0.0848 Epoch: 8 Global Step: 44130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:18,601-Speed 3359.28 samples/sec Loss 8.6569 LearningRate 0.0847 Epoch: 8 Global Step: 44140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:21,609-Speed 3404.65 samples/sec Loss 8.7630 LearningRate 0.0847 Epoch: 8 Global Step: 44150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:24,562-Speed 3468.94 samples/sec Loss 8.5343 LearningRate 0.0847 Epoch: 8 Global Step: 44160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:27,559-Speed 3417.42 samples/sec Loss 8.5954 LearningRate 0.0847 Epoch: 8 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:30,505-Speed 3476.16 samples/sec Loss 8.6168 LearningRate 0.0847 Epoch: 8 Global Step: 44180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:33,471-Speed 3454.18 samples/sec Loss 8.5487 LearningRate 0.0846 Epoch: 8 Global Step: 44190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:36,483-Speed 3399.71 samples/sec Loss 8.6578 LearningRate 0.0846 Epoch: 8 Global Step: 44200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:39,452-Speed 3450.74 samples/sec Loss 8.6644 LearningRate 0.0846 Epoch: 8 Global Step: 44210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:42,454-Speed 3411.67 samples/sec Loss 8.6737 LearningRate 0.0846 Epoch: 8 Global Step: 44220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:45,489-Speed 3374.96 samples/sec Loss 8.8556 LearningRate 0.0846 Epoch: 8 Global Step: 44230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:31:48,455-Speed 3453.84 samples/sec Loss 8.6116 LearningRate 0.0845 Epoch: 8 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:31:51,413-Speed 3462.87 samples/sec Loss 8.7739 LearningRate 0.0845 Epoch: 8 Global Step: 44250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:31:54,378-Speed 3455.15 samples/sec Loss 8.5807 LearningRate 0.0845 Epoch: 8 Global Step: 44260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:31:57,350-Speed 3445.93 samples/sec Loss 8.6596 LearningRate 0.0845 Epoch: 8 Global Step: 44270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:00,318-Speed 3450.94 samples/sec Loss 8.7144 LearningRate 0.0845 Epoch: 8 Global Step: 44280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:03,254-Speed 3489.10 samples/sec Loss 8.6153 LearningRate 0.0844 Epoch: 8 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:06,194-Speed 3483.69 samples/sec Loss 8.5713 LearningRate 0.0844 Epoch: 8 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:09,133-Speed 3484.97 samples/sec Loss 8.6032 LearningRate 0.0844 Epoch: 8 Global Step: 44310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:12,091-Speed 3462.88 samples/sec Loss 8.6888 LearningRate 0.0844 Epoch: 8 Global Step: 44320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:15,042-Speed 3470.91 samples/sec Loss 8.6910 LearningRate 0.0844 Epoch: 8 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:17,979-Speed 3487.81 samples/sec Loss 8.8671 LearningRate 0.0843 Epoch: 8 Global Step: 44340 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:32:20,938-Speed 3460.85 samples/sec Loss 8.5306 LearningRate 0.0843 Epoch: 8 Global Step: 44350 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:32:23,883-Speed 3477.96 samples/sec Loss 8.5787 LearningRate 0.0843 Epoch: 8 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:26,893-Speed 3403.91 samples/sec Loss 8.6934 LearningRate 0.0843 Epoch: 8 Global Step: 44370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:29,958-Speed 3351.15 samples/sec Loss 8.6418 LearningRate 0.0842 Epoch: 8 Global Step: 44380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:32,952-Speed 3421.47 samples/sec Loss 8.6094 LearningRate 0.0842 Epoch: 8 Global Step: 44390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:35,938-Speed 3429.80 samples/sec Loss 8.4156 LearningRate 0.0842 Epoch: 8 Global Step: 44400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:38,913-Speed 3444.06 samples/sec Loss 8.6345 LearningRate 0.0842 Epoch: 8 Global Step: 44410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:41,863-Speed 3472.81 samples/sec Loss 8.6853 LearningRate 0.0842 Epoch: 8 Global Step: 44420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:44,793-Speed 3495.64 samples/sec Loss 8.6212 LearningRate 0.0841 Epoch: 8 Global Step: 44430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:47,733-Speed 3483.95 samples/sec Loss 8.4826 LearningRate 0.0841 Epoch: 8 Global Step: 44440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:50,689-Speed 3464.19 samples/sec Loss 8.5742 LearningRate 0.0841 Epoch: 8 Global Step: 44450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:53,645-Speed 3465.32 samples/sec Loss 8.6958 LearningRate 0.0841 Epoch: 8 Global Step: 44460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:32:56,572-Speed 3499.85 samples/sec Loss 8.7146 LearningRate 0.0841 Epoch: 8 Global Step: 44470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:32:59,514-Speed 3481.08 samples/sec Loss 8.6671 LearningRate 0.0840 Epoch: 8 Global Step: 44480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:33:02,583-Speed 3337.63 samples/sec Loss 8.7070 LearningRate 0.0840 Epoch: 8 Global Step: 44490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:33:05,592-Speed 3404.49 samples/sec Loss 8.5821 LearningRate 0.0840 Epoch: 8 Global Step: 44500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:33:08,524-Speed 3493.31 samples/sec Loss 8.5105 LearningRate 0.0840 Epoch: 8 Global Step: 44510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:33:11,572-Speed 3360.81 samples/sec Loss 8.5866 LearningRate 0.0840 Epoch: 8 Global Step: 44520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:33:14,528-Speed 3465.32 samples/sec Loss 8.5587 LearningRate 0.0839 Epoch: 8 Global Step: 44530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:33:17,460-Speed 3493.21 samples/sec Loss 8.7427 LearningRate 0.0839 Epoch: 8 Global Step: 44540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:33:20,513-Speed 3354.72 samples/sec Loss 8.5653 LearningRate 0.0839 Epoch: 8 Global Step: 44550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:33:23,462-Speed 3473.30 samples/sec Loss 8.6717 LearningRate 0.0839 Epoch: 8 Global Step: 44560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:33:26,456-Speed 3420.67 samples/sec Loss 8.7908 LearningRate 0.0839 Epoch: 8 Global Step: 44570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:33:29,428-Speed 3447.80 samples/sec Loss 8.6938 LearningRate 0.0838 Epoch: 8 Global Step: 44580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:33:32,460-Speed 3377.63 samples/sec Loss 8.7187 LearningRate 0.0838 Epoch: 8 Global Step: 44590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:33:35,479-Speed 3392.01 samples/sec Loss 8.6504 LearningRate 0.0838 Epoch: 8 Global Step: 44600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:33:38,468-Speed 3427.31 samples/sec Loss 8.4567 LearningRate 0.0838 Epoch: 8 Global Step: 44610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:33:41,514-Speed 3362.80 samples/sec Loss 8.6175 LearningRate 0.0838 Epoch: 8 Global Step: 44620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:33:44,441-Speed 3499.43 samples/sec Loss 8.4661 LearningRate 0.0837 Epoch: 8 Global Step: 44630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:33:47,410-Speed 3449.99 samples/sec Loss 8.6475 LearningRate 0.0837 Epoch: 8 Global Step: 44640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:33:50,356-Speed 3476.18 samples/sec Loss 8.4486 LearningRate 0.0837 Epoch: 8 Global Step: 44650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:33:53,298-Speed 3481.42 samples/sec Loss 8.4681 LearningRate 0.0837 Epoch: 8 Global Step: 44660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:33:56,238-Speed 3483.85 samples/sec Loss 8.6160 LearningRate 0.0837 Epoch: 8 Global Step: 44670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:33:59,178-Speed 3485.66 samples/sec Loss 8.6757 LearningRate 0.0836 Epoch: 8 Global Step: 44680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:34:02,173-Speed 3420.05 samples/sec Loss 8.6233 LearningRate 0.0836 Epoch: 8 Global Step: 44690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:34:05,119-Speed 3476.37 samples/sec Loss 8.5677 LearningRate 0.0836 Epoch: 8 Global Step: 44700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:34:08,082-Speed 3456.81 samples/sec Loss 8.4656 LearningRate 0.0836 Epoch: 8 Global Step: 44710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:34:11,037-Speed 3466.12 samples/sec Loss 8.6179 LearningRate 0.0836 Epoch: 8 Global Step: 44720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:34:14,017-Speed 3436.96 samples/sec Loss 8.6580 LearningRate 0.0835 Epoch: 8 Global Step: 44730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:34:16,999-Speed 3435.08 samples/sec Loss 8.6117 LearningRate 0.0835 Epoch: 8 Global Step: 44740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:34:19,965-Speed 3453.03 samples/sec Loss 8.6464 LearningRate 0.0835 Epoch: 8 Global Step: 44750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:34:22,900-Speed 3490.13 samples/sec Loss 8.6347 LearningRate 0.0835 Epoch: 8 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:34:25,873-Speed 3445.74 samples/sec Loss 8.6456 LearningRate 0.0834 Epoch: 8 Global Step: 44770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:34:28,806-Speed 3492.93 samples/sec Loss 8.6820 LearningRate 0.0834 Epoch: 8 Global Step: 44780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:34:31,746-Speed 3483.53 samples/sec Loss 8.5357 LearningRate 0.0834 Epoch: 8 Global Step: 44790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:34:34,695-Speed 3473.64 samples/sec Loss 8.6504 LearningRate 0.0834 Epoch: 8 Global Step: 44800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:34:37,631-Speed 3488.22 samples/sec Loss 8.5725 LearningRate 0.0834 Epoch: 8 Global Step: 44810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:34:40,610-Speed 3437.25 samples/sec Loss 8.5047 LearningRate 0.0833 Epoch: 8 Global Step: 44820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:34:43,614-Speed 3411.55 samples/sec Loss 8.6992 LearningRate 0.0833 Epoch: 8 Global Step: 44830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:34:46,591-Speed 3439.71 samples/sec Loss 8.5780 LearningRate 0.0833 Epoch: 8 Global Step: 44840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:34:49,605-Speed 3398.84 samples/sec Loss 8.5964 LearningRate 0.0833 Epoch: 8 Global Step: 44850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:34:52,584-Speed 3438.16 samples/sec Loss 8.5143 LearningRate 0.0833 Epoch: 8 Global Step: 44860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:34:55,540-Speed 3465.91 samples/sec Loss 8.5546 LearningRate 0.0832 Epoch: 8 Global Step: 44870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:34:58,478-Speed 3486.48 samples/sec Loss 8.6113 LearningRate 0.0832 Epoch: 8 Global Step: 44880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:01,428-Speed 3471.73 samples/sec Loss 8.5377 LearningRate 0.0832 Epoch: 8 Global Step: 44890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:04,483-Speed 3352.80 samples/sec Loss 8.4651 LearningRate 0.0832 Epoch: 8 Global Step: 44900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:07,459-Speed 3441.79 samples/sec Loss 8.7524 LearningRate 0.0832 Epoch: 8 Global Step: 44910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:10,434-Speed 3442.57 samples/sec Loss 8.5841 LearningRate 0.0831 Epoch: 8 Global Step: 44920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:13,424-Speed 3426.16 samples/sec Loss 8.5733 LearningRate 0.0831 Epoch: 8 Global Step: 44930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:16,411-Speed 3429.04 samples/sec Loss 8.6871 LearningRate 0.0831 Epoch: 8 Global Step: 44940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:19,351-Speed 3485.26 samples/sec Loss 8.6051 LearningRate 0.0831 Epoch: 8 Global Step: 44950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:22,299-Speed 3475.03 samples/sec Loss 8.6040 LearningRate 0.0831 Epoch: 8 Global Step: 44960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:25,235-Speed 3488.42 samples/sec Loss 8.5458 LearningRate 0.0830 Epoch: 8 Global Step: 44970 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:35:28,176-Speed 3481.92 samples/sec Loss 8.5916 LearningRate 0.0830 Epoch: 8 Global Step: 44980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:31,261-Speed 3320.45 samples/sec Loss 8.5958 LearningRate 0.0830 Epoch: 8 Global Step: 44990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:34,231-Speed 3449.08 samples/sec Loss 8.4974 LearningRate 0.0830 Epoch: 8 Global Step: 45000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:37,190-Speed 3461.05 samples/sec Loss 8.5217 LearningRate 0.0830 Epoch: 8 Global Step: 45010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:40,121-Speed 3495.16 samples/sec Loss 8.5510 LearningRate 0.0829 Epoch: 8 Global Step: 45020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:43,077-Speed 3464.85 samples/sec Loss 8.5466 LearningRate 0.0829 Epoch: 8 Global Step: 45030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:46,044-Speed 3452.47 samples/sec Loss 8.6684 LearningRate 0.0829 Epoch: 8 Global Step: 45040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:49,041-Speed 3417.82 samples/sec Loss 8.5859 LearningRate 0.0829 Epoch: 8 Global Step: 45050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:51,986-Speed 3478.01 samples/sec Loss 8.4125 LearningRate 0.0829 Epoch: 8 Global Step: 45060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:54,942-Speed 3465.75 samples/sec Loss 8.5929 LearningRate 0.0828 Epoch: 8 Global Step: 45070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:35:57,884-Speed 3480.30 samples/sec Loss 8.3950 LearningRate 0.0828 Epoch: 8 Global Step: 45080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:00,830-Speed 3476.85 samples/sec Loss 8.4868 LearningRate 0.0828 Epoch: 8 Global Step: 45090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:03,796-Speed 3453.84 samples/sec Loss 8.6151 LearningRate 0.0828 Epoch: 8 Global Step: 45100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:06,734-Speed 3485.73 samples/sec Loss 8.6142 LearningRate 0.0828 Epoch: 8 Global Step: 45110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:09,679-Speed 3478.91 samples/sec Loss 8.5061 LearningRate 0.0827 Epoch: 8 Global Step: 45120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:12,629-Speed 3471.92 samples/sec Loss 8.3490 LearningRate 0.0827 Epoch: 8 Global Step: 45130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:15,608-Speed 3437.58 samples/sec Loss 8.4770 LearningRate 0.0827 Epoch: 8 Global Step: 45140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:18,603-Speed 3420.29 samples/sec Loss 8.5861 LearningRate 0.0827 Epoch: 8 Global Step: 45150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:21,539-Speed 3489.07 samples/sec Loss 8.4883 LearningRate 0.0827 Epoch: 8 Global Step: 45160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:24,493-Speed 3467.69 samples/sec Loss 8.6272 LearningRate 0.0826 Epoch: 8 Global Step: 45170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:27,427-Speed 3490.66 samples/sec Loss 8.4553 LearningRate 0.0826 Epoch: 8 Global Step: 45180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:30,379-Speed 3469.47 samples/sec Loss 8.5366 LearningRate 0.0826 Epoch: 8 Global Step: 45190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:33,313-Speed 3490.64 samples/sec Loss 8.4581 LearningRate 0.0826 Epoch: 8 Global Step: 45200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:36,281-Speed 3451.65 samples/sec Loss 8.4803 LearningRate 0.0826 Epoch: 8 Global Step: 45210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:39,217-Speed 3488.02 samples/sec Loss 8.6185 LearningRate 0.0825 Epoch: 8 Global Step: 45220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:42,250-Speed 3377.54 samples/sec Loss 8.6212 LearningRate 0.0825 Epoch: 8 Global Step: 45230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:45,194-Speed 3479.90 samples/sec Loss 8.6553 LearningRate 0.0825 Epoch: 8 Global Step: 45240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:48,250-Speed 3351.69 samples/sec Loss 8.5701 LearningRate 0.0825 Epoch: 8 Global Step: 45250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:51,314-Speed 3342.64 samples/sec Loss 8.5380 LearningRate 0.0825 Epoch: 8 Global Step: 45260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:54,249-Speed 3489.65 samples/sec Loss 8.5082 LearningRate 0.0824 Epoch: 8 Global Step: 45270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:36:57,184-Speed 3489.75 samples/sec Loss 8.6446 LearningRate 0.0824 Epoch: 8 Global Step: 45280 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:37:00,189-Speed 3408.40 samples/sec Loss 8.4361 LearningRate 0.0824 Epoch: 8 Global Step: 45290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:03,152-Speed 3456.81 samples/sec Loss 8.4668 LearningRate 0.0824 Epoch: 8 Global Step: 45300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:06,102-Speed 3471.84 samples/sec Loss 8.6205 LearningRate 0.0823 Epoch: 8 Global Step: 45310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:09,042-Speed 3484.60 samples/sec Loss 8.4164 LearningRate 0.0823 Epoch: 8 Global Step: 45320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:12,052-Speed 3402.75 samples/sec Loss 8.5806 LearningRate 0.0823 Epoch: 8 Global Step: 45330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:15,063-Speed 3402.34 samples/sec Loss 8.4590 LearningRate 0.0823 Epoch: 8 Global Step: 45340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:18,057-Speed 3420.72 samples/sec Loss 8.3921 LearningRate 0.0823 Epoch: 8 Global Step: 45350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:20,991-Speed 3490.87 samples/sec Loss 8.4812 LearningRate 0.0822 Epoch: 8 Global Step: 45360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:23,939-Speed 3474.93 samples/sec Loss 8.6079 LearningRate 0.0822 Epoch: 8 Global Step: 45370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:26,868-Speed 3497.33 samples/sec Loss 8.5407 LearningRate 0.0822 Epoch: 8 Global Step: 45380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:29,830-Speed 3458.26 samples/sec Loss 8.6904 LearningRate 0.0822 Epoch: 8 Global Step: 45390 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:37:32,759-Speed 3495.84 samples/sec Loss 8.6334 LearningRate 0.0822 Epoch: 8 Global Step: 45400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:35,697-Speed 3486.99 samples/sec Loss 8.4119 LearningRate 0.0821 Epoch: 8 Global Step: 45410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:38,641-Speed 3479.36 samples/sec Loss 8.4056 LearningRate 0.0821 Epoch: 8 Global Step: 45420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:41,577-Speed 3488.89 samples/sec Loss 8.5955 LearningRate 0.0821 Epoch: 8 Global Step: 45430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:44,511-Speed 3491.02 samples/sec Loss 8.5524 LearningRate 0.0821 Epoch: 8 Global Step: 45440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:47,463-Speed 3469.81 samples/sec Loss 8.6047 LearningRate 0.0821 Epoch: 8 Global Step: 45450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:50,446-Speed 3433.48 samples/sec Loss 8.6080 LearningRate 0.0820 Epoch: 8 Global Step: 45460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:53,406-Speed 3460.99 samples/sec Loss 8.5922 LearningRate 0.0820 Epoch: 8 Global Step: 45470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:56,370-Speed 3454.68 samples/sec Loss 8.5766 LearningRate 0.0820 Epoch: 8 Global Step: 45480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:37:59,357-Speed 3429.97 samples/sec Loss 8.7144 LearningRate 0.0820 Epoch: 8 Global Step: 45490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:38:02,298-Speed 3481.55 samples/sec Loss 8.6115 LearningRate 0.0820 Epoch: 8 Global Step: 45500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:38:05,230-Speed 3493.85 samples/sec Loss 8.4544 LearningRate 0.0819 Epoch: 8 Global Step: 45510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:38:08,231-Speed 3413.32 samples/sec Loss 8.4086 LearningRate 0.0819 Epoch: 8 Global Step: 45520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:38:21,793-Speed 755.14 samples/sec Loss 7.9686 LearningRate 0.0819 Epoch: 9 Global Step: 45530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:38:24,728-Speed 3489.24 samples/sec Loss 7.6886 LearningRate 0.0819 Epoch: 9 Global Step: 45540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:38:27,702-Speed 3444.15 samples/sec Loss 7.8541 LearningRate 0.0819 Epoch: 9 Global Step: 45550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:38:30,701-Speed 3415.39 samples/sec Loss 7.7843 LearningRate 0.0818 Epoch: 9 Global Step: 45560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:38:33,658-Speed 3465.13 samples/sec Loss 7.8459 LearningRate 0.0818 Epoch: 9 Global Step: 45570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:38:36,611-Speed 3467.52 samples/sec Loss 7.7538 LearningRate 0.0818 Epoch: 9 Global Step: 45580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:38:39,628-Speed 3395.35 samples/sec Loss 7.7977 LearningRate 0.0818 Epoch: 9 Global Step: 45590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:38:42,657-Speed 3380.98 samples/sec Loss 7.8736 LearningRate 0.0818 Epoch: 9 Global Step: 45600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:38:45,605-Speed 3475.57 samples/sec Loss 7.9712 LearningRate 0.0817 Epoch: 9 Global Step: 45610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:38:48,537-Speed 3493.72 samples/sec Loss 7.7142 LearningRate 0.0817 Epoch: 9 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:38:51,487-Speed 3472.16 samples/sec Loss 7.7768 LearningRate 0.0817 Epoch: 9 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:38:54,420-Speed 3492.33 samples/sec Loss 7.8930 LearningRate 0.0817 Epoch: 9 Global Step: 45640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:38:57,402-Speed 3434.61 samples/sec Loss 7.8672 LearningRate 0.0817 Epoch: 9 Global Step: 45650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:00,347-Speed 3478.32 samples/sec Loss 7.6423 LearningRate 0.0816 Epoch: 9 Global Step: 45660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:03,308-Speed 3458.85 samples/sec Loss 7.9660 LearningRate 0.0816 Epoch: 9 Global Step: 45670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:06,260-Speed 3470.23 samples/sec Loss 7.9599 LearningRate 0.0816 Epoch: 9 Global Step: 45680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:09,224-Speed 3455.24 samples/sec Loss 7.8341 LearningRate 0.0816 Epoch: 9 Global Step: 45690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:12,171-Speed 3475.42 samples/sec Loss 7.8556 LearningRate 0.0816 Epoch: 9 Global Step: 45700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:15,124-Speed 3469.09 samples/sec Loss 7.7521 LearningRate 0.0815 Epoch: 9 Global Step: 45710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:39:18,051-Speed 3499.57 samples/sec Loss 7.9624 LearningRate 0.0815 Epoch: 9 Global Step: 45720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:20,996-Speed 3478.24 samples/sec Loss 7.8415 LearningRate 0.0815 Epoch: 9 Global Step: 45730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:23,988-Speed 3422.57 samples/sec Loss 7.9006 LearningRate 0.0815 Epoch: 9 Global Step: 45740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:26,927-Speed 3485.24 samples/sec Loss 7.8739 LearningRate 0.0815 Epoch: 9 Global Step: 45750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:29,861-Speed 3491.05 samples/sec Loss 8.1231 LearningRate 0.0814 Epoch: 9 Global Step: 45760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:32,895-Speed 3376.55 samples/sec Loss 8.0356 LearningRate 0.0814 Epoch: 9 Global Step: 45770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:35,882-Speed 3428.79 samples/sec Loss 7.9096 LearningRate 0.0814 Epoch: 9 Global Step: 45780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:38,846-Speed 3455.70 samples/sec Loss 8.1604 LearningRate 0.0814 Epoch: 9 Global Step: 45790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:41,782-Speed 3488.45 samples/sec Loss 7.9336 LearningRate 0.0814 Epoch: 9 Global Step: 45800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:44,729-Speed 3476.13 samples/sec Loss 7.8275 LearningRate 0.0813 Epoch: 9 Global Step: 45810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:47,736-Speed 3406.72 samples/sec Loss 8.1413 LearningRate 0.0813 Epoch: 9 Global Step: 45820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:50,722-Speed 3430.56 samples/sec Loss 8.0274 LearningRate 0.0813 Epoch: 9 Global Step: 45830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:53,677-Speed 3466.58 samples/sec Loss 8.0110 LearningRate 0.0813 Epoch: 9 Global Step: 45840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:56,632-Speed 3473.42 samples/sec Loss 8.0718 LearningRate 0.0813 Epoch: 9 Global Step: 45850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:39:59,581-Speed 3472.77 samples/sec Loss 8.1799 LearningRate 0.0812 Epoch: 9 Global Step: 45860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:40:02,543-Speed 3458.11 samples/sec Loss 8.2125 LearningRate 0.0812 Epoch: 9 Global Step: 45870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:40:05,501-Speed 3462.82 samples/sec Loss 8.0043 LearningRate 0.0812 Epoch: 9 Global Step: 45880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:40:08,449-Speed 3474.95 samples/sec Loss 7.8785 LearningRate 0.0812 Epoch: 9 Global Step: 45890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:40:11,393-Speed 3479.00 samples/sec Loss 8.1540 LearningRate 0.0812 Epoch: 9 Global Step: 45900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:40:14,476-Speed 3321.93 samples/sec Loss 8.2096 LearningRate 0.0811 Epoch: 9 Global Step: 45910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:40:17,474-Speed 3416.81 samples/sec Loss 8.1880 LearningRate 0.0811 Epoch: 9 Global Step: 45920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:40:20,456-Speed 3434.21 samples/sec Loss 8.0623 LearningRate 0.0811 Epoch: 9 Global Step: 45930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:40:23,428-Speed 3447.12 samples/sec Loss 8.2155 LearningRate 0.0811 Epoch: 9 Global Step: 45940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:40:26,396-Speed 3451.64 samples/sec Loss 8.3534 LearningRate 0.0811 Epoch: 9 Global Step: 45950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:40:29,350-Speed 3467.70 samples/sec Loss 8.2056 LearningRate 0.0810 Epoch: 9 Global Step: 45960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:40:32,287-Speed 3487.33 samples/sec Loss 8.0903 LearningRate 0.0810 Epoch: 9 Global Step: 45970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:40:35,226-Speed 3485.90 samples/sec Loss 8.0452 LearningRate 0.0810 Epoch: 9 Global Step: 45980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:40:38,199-Speed 3445.43 samples/sec Loss 8.2157 LearningRate 0.0810 Epoch: 9 Global Step: 45990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:40:41,202-Speed 3411.28 samples/sec Loss 8.1895 LearningRate 0.0810 Epoch: 9 Global Step: 46000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:41:24,556-[lfw][46000]XNorm: 22.671222 Training: 2022-01-19 21:41:24,557-[lfw][46000]Accuracy-Flip: 0.99633+-0.00340 Training: 2022-01-19 21:41:24,557-[lfw][46000]Accuracy-Highest: 0.99700 Training: 2022-01-19 21:42:14,789-[cfp_fp][46000]XNorm: 19.698489 Training: 2022-01-19 21:42:14,789-[cfp_fp][46000]Accuracy-Flip: 0.96257+-0.01002 Training: 2022-01-19 21:42:14,790-[cfp_fp][46000]Accuracy-Highest: 0.96257 Training: 2022-01-19 21:42:57,949-[agedb_30][46000]XNorm: 22.271514 Training: 2022-01-19 21:42:57,950-[agedb_30][46000]Accuracy-Flip: 0.97000+-0.00592 Training: 2022-01-19 21:42:57,951-[agedb_30][46000]Accuracy-Highest: 0.97350 Training: 2022-01-19 21:43:00,897-Speed 73.30 samples/sec Loss 8.2015 LearningRate 0.0809 Epoch: 9 Global Step: 46010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:43:03,834-Speed 3486.88 samples/sec Loss 8.1461 LearningRate 0.0809 Epoch: 9 Global Step: 46020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:43:06,792-Speed 3462.88 samples/sec Loss 8.2847 LearningRate 0.0809 Epoch: 9 Global Step: 46030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:43:09,718-Speed 3500.22 samples/sec Loss 8.1301 LearningRate 0.0809 Epoch: 9 Global Step: 46040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:43:12,722-Speed 3409.48 samples/sec Loss 8.2152 LearningRate 0.0809 Epoch: 9 Global Step: 46050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:43:15,679-Speed 3465.05 samples/sec Loss 8.2977 LearningRate 0.0808 Epoch: 9 Global Step: 46060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:43:18,614-Speed 3490.15 samples/sec Loss 8.3104 LearningRate 0.0808 Epoch: 9 Global Step: 46070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:21,541-Speed 3498.70 samples/sec Loss 8.2079 LearningRate 0.0808 Epoch: 9 Global Step: 46080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:24,500-Speed 3462.28 samples/sec Loss 8.3711 LearningRate 0.0808 Epoch: 9 Global Step: 46090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:27,434-Speed 3490.55 samples/sec Loss 8.2877 LearningRate 0.0808 Epoch: 9 Global Step: 46100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:30,386-Speed 3470.75 samples/sec Loss 8.3034 LearningRate 0.0807 Epoch: 9 Global Step: 46110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:33,326-Speed 3483.82 samples/sec Loss 8.3251 LearningRate 0.0807 Epoch: 9 Global Step: 46120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:36,277-Speed 3470.28 samples/sec Loss 8.3077 LearningRate 0.0807 Epoch: 9 Global Step: 46130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:39,265-Speed 3428.87 samples/sec Loss 8.2475 LearningRate 0.0807 Epoch: 9 Global Step: 46140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:42,216-Speed 3470.04 samples/sec Loss 8.1113 LearningRate 0.0807 Epoch: 9 Global Step: 46150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:45,172-Speed 3465.54 samples/sec Loss 8.3888 LearningRate 0.0806 Epoch: 9 Global Step: 46160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:48,134-Speed 3457.42 samples/sec Loss 8.2123 LearningRate 0.0806 Epoch: 9 Global Step: 46170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:51,107-Speed 3445.36 samples/sec Loss 8.2544 LearningRate 0.0806 Epoch: 9 Global Step: 46180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:54,059-Speed 3470.32 samples/sec Loss 8.3136 LearningRate 0.0806 Epoch: 9 Global Step: 46190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:43:57,000-Speed 3482.62 samples/sec Loss 8.2476 LearningRate 0.0806 Epoch: 9 Global Step: 46200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:44:00,003-Speed 3411.30 samples/sec Loss 8.2849 LearningRate 0.0805 Epoch: 9 Global Step: 46210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:44:03,124-Speed 3281.14 samples/sec Loss 8.2280 LearningRate 0.0805 Epoch: 9 Global Step: 46220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:44:06,075-Speed 3471.90 samples/sec Loss 8.2003 LearningRate 0.0805 Epoch: 9 Global Step: 46230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:44:09,006-Speed 3494.37 samples/sec Loss 8.2871 LearningRate 0.0805 Epoch: 9 Global Step: 46240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:44:11,956-Speed 3472.02 samples/sec Loss 8.3244 LearningRate 0.0805 Epoch: 9 Global Step: 46250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:44:14,903-Speed 3475.84 samples/sec Loss 8.4300 LearningRate 0.0804 Epoch: 9 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:44:17,838-Speed 3489.29 samples/sec Loss 8.2315 LearningRate 0.0804 Epoch: 9 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:44:20,800-Speed 3458.54 samples/sec Loss 8.2290 LearningRate 0.0804 Epoch: 9 Global Step: 46280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:44:23,755-Speed 3466.36 samples/sec Loss 8.4356 LearningRate 0.0804 Epoch: 9 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:44:26,719-Speed 3455.22 samples/sec Loss 8.3538 LearningRate 0.0804 Epoch: 9 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:44:29,717-Speed 3417.63 samples/sec Loss 8.3106 LearningRate 0.0803 Epoch: 9 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:44:32,717-Speed 3413.53 samples/sec Loss 8.3342 LearningRate 0.0803 Epoch: 9 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:44:35,653-Speed 3489.15 samples/sec Loss 8.2123 LearningRate 0.0803 Epoch: 9 Global Step: 46330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:44:38,594-Speed 3482.38 samples/sec Loss 8.2586 LearningRate 0.0803 Epoch: 9 Global Step: 46340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:44:41,548-Speed 3467.53 samples/sec Loss 8.2821 LearningRate 0.0803 Epoch: 9 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:44:44,591-Speed 3365.73 samples/sec Loss 8.2982 LearningRate 0.0802 Epoch: 9 Global Step: 46360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:44:47,541-Speed 3472.53 samples/sec Loss 8.4187 LearningRate 0.0802 Epoch: 9 Global Step: 46370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:44:50,475-Speed 3490.71 samples/sec Loss 8.4058 LearningRate 0.0802 Epoch: 9 Global Step: 46380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:44:53,442-Speed 3452.91 samples/sec Loss 8.5342 LearningRate 0.0802 Epoch: 9 Global Step: 46390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:44:56,422-Speed 3437.00 samples/sec Loss 8.1804 LearningRate 0.0802 Epoch: 9 Global Step: 46400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:44:59,379-Speed 3464.50 samples/sec Loss 8.1720 LearningRate 0.0801 Epoch: 9 Global Step: 46410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:45:02,328-Speed 3473.35 samples/sec Loss 8.1642 LearningRate 0.0801 Epoch: 9 Global Step: 46420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:45:05,386-Speed 3349.12 samples/sec Loss 8.3022 LearningRate 0.0801 Epoch: 9 Global Step: 46430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:45:08,472-Speed 3319.02 samples/sec Loss 8.3008 LearningRate 0.0801 Epoch: 9 Global Step: 46440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:45:11,453-Speed 3436.42 samples/sec Loss 8.2581 LearningRate 0.0801 Epoch: 9 Global Step: 46450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:45:14,410-Speed 3463.85 samples/sec Loss 8.4775 LearningRate 0.0800 Epoch: 9 Global Step: 46460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:45:17,354-Speed 3478.64 samples/sec Loss 8.3724 LearningRate 0.0800 Epoch: 9 Global Step: 46470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:45:20,302-Speed 3476.24 samples/sec Loss 8.2632 LearningRate 0.0800 Epoch: 9 Global Step: 46480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:45:23,317-Speed 3396.84 samples/sec Loss 8.2647 LearningRate 0.0800 Epoch: 9 Global Step: 46490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:45:26,311-Speed 3421.05 samples/sec Loss 8.3459 LearningRate 0.0800 Epoch: 9 Global Step: 46500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:45:29,250-Speed 3486.52 samples/sec Loss 8.3731 LearningRate 0.0799 Epoch: 9 Global Step: 46510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:45:32,178-Speed 3497.73 samples/sec Loss 8.3806 LearningRate 0.0799 Epoch: 9 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:45:35,111-Speed 3493.16 samples/sec Loss 8.3781 LearningRate 0.0799 Epoch: 9 Global Step: 46530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:45:38,061-Speed 3471.63 samples/sec Loss 8.3066 LearningRate 0.0799 Epoch: 9 Global Step: 46540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:45:41,030-Speed 3450.15 samples/sec Loss 8.3639 LearningRate 0.0799 Epoch: 9 Global Step: 46550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:45:43,965-Speed 3489.48 samples/sec Loss 8.3316 LearningRate 0.0798 Epoch: 9 Global Step: 46560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:45:46,921-Speed 3464.89 samples/sec Loss 8.4473 LearningRate 0.0798 Epoch: 9 Global Step: 46570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:45:49,858-Speed 3487.85 samples/sec Loss 8.2458 LearningRate 0.0798 Epoch: 9 Global Step: 46580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:45:52,824-Speed 3453.56 samples/sec Loss 8.3542 LearningRate 0.0798 Epoch: 9 Global Step: 46590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:45:55,806-Speed 3434.70 samples/sec Loss 8.2290 LearningRate 0.0798 Epoch: 9 Global Step: 46600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:45:58,794-Speed 3428.47 samples/sec Loss 8.4089 LearningRate 0.0797 Epoch: 9 Global Step: 46610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:01,747-Speed 3467.85 samples/sec Loss 8.2964 LearningRate 0.0797 Epoch: 9 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:04,689-Speed 3482.11 samples/sec Loss 8.4153 LearningRate 0.0797 Epoch: 9 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:07,631-Speed 3481.88 samples/sec Loss 8.2157 LearningRate 0.0797 Epoch: 9 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:10,594-Speed 3456.61 samples/sec Loss 8.5083 LearningRate 0.0797 Epoch: 9 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:13,530-Speed 3488.77 samples/sec Loss 8.4521 LearningRate 0.0796 Epoch: 9 Global Step: 46660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:16,518-Speed 3428.18 samples/sec Loss 8.2836 LearningRate 0.0796 Epoch: 9 Global Step: 46670 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:46:19,458-Speed 3484.59 samples/sec Loss 8.4137 LearningRate 0.0796 Epoch: 9 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:22,429-Speed 3447.72 samples/sec Loss 8.2001 LearningRate 0.0796 Epoch: 9 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:25,420-Speed 3425.07 samples/sec Loss 8.3794 LearningRate 0.0796 Epoch: 9 Global Step: 46700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:28,377-Speed 3462.75 samples/sec Loss 8.5773 LearningRate 0.0795 Epoch: 9 Global Step: 46710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:31,324-Speed 3475.79 samples/sec Loss 8.4407 LearningRate 0.0795 Epoch: 9 Global Step: 46720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:34,274-Speed 3473.32 samples/sec Loss 8.4212 LearningRate 0.0795 Epoch: 9 Global Step: 46730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:37,257-Speed 3433.68 samples/sec Loss 8.3326 LearningRate 0.0795 Epoch: 9 Global Step: 46740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:40,201-Speed 3479.13 samples/sec Loss 8.3021 LearningRate 0.0795 Epoch: 9 Global Step: 46750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:43,169-Speed 3451.87 samples/sec Loss 8.3308 LearningRate 0.0794 Epoch: 9 Global Step: 46760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:46,162-Speed 3422.93 samples/sec Loss 8.2733 LearningRate 0.0794 Epoch: 9 Global Step: 46770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:49,170-Speed 3404.66 samples/sec Loss 8.4610 LearningRate 0.0794 Epoch: 9 Global Step: 46780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:46:52,095-Speed 3502.27 samples/sec Loss 8.4563 LearningRate 0.0794 Epoch: 9 Global Step: 46790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:55,027-Speed 3493.59 samples/sec Loss 8.3020 LearningRate 0.0794 Epoch: 9 Global Step: 46800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:46:57,962-Speed 3489.10 samples/sec Loss 8.1651 LearningRate 0.0793 Epoch: 9 Global Step: 46810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:00,901-Speed 3485.31 samples/sec Loss 8.1344 LearningRate 0.0793 Epoch: 9 Global Step: 46820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:03,850-Speed 3473.56 samples/sec Loss 8.2237 LearningRate 0.0793 Epoch: 9 Global Step: 46830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:06,801-Speed 3470.70 samples/sec Loss 8.2048 LearningRate 0.0793 Epoch: 9 Global Step: 46840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:09,771-Speed 3448.71 samples/sec Loss 8.3431 LearningRate 0.0793 Epoch: 9 Global Step: 46850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:12,747-Speed 3442.00 samples/sec Loss 8.4950 LearningRate 0.0792 Epoch: 9 Global Step: 46860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:15,740-Speed 3422.36 samples/sec Loss 8.3680 LearningRate 0.0792 Epoch: 9 Global Step: 46870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:18,954-Speed 3187.33 samples/sec Loss 8.4021 LearningRate 0.0792 Epoch: 9 Global Step: 46880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:21,894-Speed 3483.60 samples/sec Loss 8.2192 LearningRate 0.0792 Epoch: 9 Global Step: 46890 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:47:24,828-Speed 3491.37 samples/sec Loss 8.2262 LearningRate 0.0792 Epoch: 9 Global Step: 46900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:27,872-Speed 3364.23 samples/sec Loss 8.3161 LearningRate 0.0791 Epoch: 9 Global Step: 46910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:30,828-Speed 3466.02 samples/sec Loss 8.3361 LearningRate 0.0791 Epoch: 9 Global Step: 46920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:33,854-Speed 3384.14 samples/sec Loss 8.2123 LearningRate 0.0791 Epoch: 9 Global Step: 46930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:36,890-Speed 3374.03 samples/sec Loss 8.3428 LearningRate 0.0791 Epoch: 9 Global Step: 46940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:39,923-Speed 3377.09 samples/sec Loss 8.3229 LearningRate 0.0791 Epoch: 9 Global Step: 46950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:42,870-Speed 3475.42 samples/sec Loss 8.1299 LearningRate 0.0790 Epoch: 9 Global Step: 46960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:45,816-Speed 3477.64 samples/sec Loss 8.2633 LearningRate 0.0790 Epoch: 9 Global Step: 46970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:48,782-Speed 3453.12 samples/sec Loss 8.3963 LearningRate 0.0790 Epoch: 9 Global Step: 46980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:51,719-Speed 3486.66 samples/sec Loss 8.2210 LearningRate 0.0790 Epoch: 9 Global Step: 46990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:47:54,707-Speed 3428.82 samples/sec Loss 8.2717 LearningRate 0.0790 Epoch: 9 Global Step: 47000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:47:57,749-Speed 3366.69 samples/sec Loss 8.3363 LearningRate 0.0789 Epoch: 9 Global Step: 47010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:48:00,699-Speed 3472.05 samples/sec Loss 8.3810 LearningRate 0.0789 Epoch: 9 Global Step: 47020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:48:03,672-Speed 3445.08 samples/sec Loss 8.0539 LearningRate 0.0789 Epoch: 9 Global Step: 47030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:48:06,617-Speed 3477.80 samples/sec Loss 8.2178 LearningRate 0.0789 Epoch: 9 Global Step: 47040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:48:09,564-Speed 3478.37 samples/sec Loss 8.2338 LearningRate 0.0789 Epoch: 9 Global Step: 47050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:48:12,503-Speed 3484.75 samples/sec Loss 8.3576 LearningRate 0.0788 Epoch: 9 Global Step: 47060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:48:15,485-Speed 3434.87 samples/sec Loss 8.4278 LearningRate 0.0788 Epoch: 9 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:48:18,455-Speed 3449.02 samples/sec Loss 8.5745 LearningRate 0.0788 Epoch: 9 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:48:21,389-Speed 3491.42 samples/sec Loss 8.5091 LearningRate 0.0788 Epoch: 9 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:48:24,329-Speed 3483.38 samples/sec Loss 8.3426 LearningRate 0.0788 Epoch: 9 Global Step: 47100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:48:27,338-Speed 3403.95 samples/sec Loss 8.2783 LearningRate 0.0787 Epoch: 9 Global Step: 47110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:48:30,304-Speed 3453.36 samples/sec Loss 8.5325 LearningRate 0.0787 Epoch: 9 Global Step: 47120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:48:33,362-Speed 3349.32 samples/sec Loss 8.2663 LearningRate 0.0787 Epoch: 9 Global Step: 47130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:48:36,356-Speed 3421.73 samples/sec Loss 8.1653 LearningRate 0.0787 Epoch: 9 Global Step: 47140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:48:39,359-Speed 3410.34 samples/sec Loss 8.3731 LearningRate 0.0787 Epoch: 9 Global Step: 47150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:48:42,322-Speed 3456.72 samples/sec Loss 8.2346 LearningRate 0.0786 Epoch: 9 Global Step: 47160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:48:45,302-Speed 3437.97 samples/sec Loss 8.2747 LearningRate 0.0786 Epoch: 9 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:48:48,247-Speed 3477.14 samples/sec Loss 8.3428 LearningRate 0.0786 Epoch: 9 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:48:51,211-Speed 3456.28 samples/sec Loss 8.3939 LearningRate 0.0786 Epoch: 9 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:48:54,152-Speed 3484.16 samples/sec Loss 8.3501 LearningRate 0.0786 Epoch: 9 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:48:57,080-Speed 3497.59 samples/sec Loss 8.3471 LearningRate 0.0785 Epoch: 9 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:00,116-Speed 3373.36 samples/sec Loss 8.3077 LearningRate 0.0785 Epoch: 9 Global Step: 47220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:03,111-Speed 3420.94 samples/sec Loss 8.2542 LearningRate 0.0785 Epoch: 9 Global Step: 47230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:06,069-Speed 3462.93 samples/sec Loss 8.1877 LearningRate 0.0785 Epoch: 9 Global Step: 47240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:09,020-Speed 3470.93 samples/sec Loss 8.3812 LearningRate 0.0785 Epoch: 9 Global Step: 47250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:11,964-Speed 3479.23 samples/sec Loss 8.3642 LearningRate 0.0784 Epoch: 9 Global Step: 47260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:14,936-Speed 3446.56 samples/sec Loss 8.2656 LearningRate 0.0784 Epoch: 9 Global Step: 47270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:17,886-Speed 3471.20 samples/sec Loss 8.2569 LearningRate 0.0784 Epoch: 9 Global Step: 47280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:20,838-Speed 3470.31 samples/sec Loss 8.2510 LearningRate 0.0784 Epoch: 9 Global Step: 47290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:23,786-Speed 3473.98 samples/sec Loss 8.1793 LearningRate 0.0784 Epoch: 9 Global Step: 47300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:26,723-Speed 3487.63 samples/sec Loss 8.4285 LearningRate 0.0783 Epoch: 9 Global Step: 47310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:29,696-Speed 3445.88 samples/sec Loss 8.2254 LearningRate 0.0783 Epoch: 9 Global Step: 47320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:32,639-Speed 3480.31 samples/sec Loss 8.2217 LearningRate 0.0783 Epoch: 9 Global Step: 47330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:35,613-Speed 3444.55 samples/sec Loss 8.0967 LearningRate 0.0783 Epoch: 9 Global Step: 47340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:38,577-Speed 3456.01 samples/sec Loss 8.3290 LearningRate 0.0783 Epoch: 9 Global Step: 47350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:41,530-Speed 3468.56 samples/sec Loss 8.2936 LearningRate 0.0782 Epoch: 9 Global Step: 47360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:44,485-Speed 3465.80 samples/sec Loss 8.2333 LearningRate 0.0782 Epoch: 9 Global Step: 47370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:47,498-Speed 3399.08 samples/sec Loss 8.1287 LearningRate 0.0782 Epoch: 9 Global Step: 47380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:49:50,481-Speed 3434.42 samples/sec Loss 8.2414 LearningRate 0.0782 Epoch: 9 Global Step: 47390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:49:53,413-Speed 3492.96 samples/sec Loss 8.3257 LearningRate 0.0782 Epoch: 9 Global Step: 47400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:49:56,340-Speed 3499.76 samples/sec Loss 8.2306 LearningRate 0.0781 Epoch: 9 Global Step: 47410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:49:59,316-Speed 3442.49 samples/sec Loss 8.2742 LearningRate 0.0781 Epoch: 9 Global Step: 47420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:02,329-Speed 3399.59 samples/sec Loss 8.1983 LearningRate 0.0781 Epoch: 9 Global Step: 47430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:05,337-Speed 3464.24 samples/sec Loss 8.4093 LearningRate 0.0781 Epoch: 9 Global Step: 47440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:08,306-Speed 3450.01 samples/sec Loss 8.3444 LearningRate 0.0781 Epoch: 9 Global Step: 47450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:11,317-Speed 3472.55 samples/sec Loss 8.2667 LearningRate 0.0780 Epoch: 9 Global Step: 47460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:14,276-Speed 3461.09 samples/sec Loss 8.4136 LearningRate 0.0780 Epoch: 9 Global Step: 47470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:17,263-Speed 3429.07 samples/sec Loss 8.2028 LearningRate 0.0780 Epoch: 9 Global Step: 47480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:20,256-Speed 3483.56 samples/sec Loss 8.3089 LearningRate 0.0780 Epoch: 9 Global Step: 47490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:50:23,201-Speed 3478.06 samples/sec Loss 8.2707 LearningRate 0.0780 Epoch: 9 Global Step: 47500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:26,227-Speed 3480.46 samples/sec Loss 8.2442 LearningRate 0.0779 Epoch: 9 Global Step: 47510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:29,163-Speed 3489.36 samples/sec Loss 8.2822 LearningRate 0.0779 Epoch: 9 Global Step: 47520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:32,100-Speed 3486.85 samples/sec Loss 8.2573 LearningRate 0.0779 Epoch: 9 Global Step: 47530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:35,045-Speed 3478.39 samples/sec Loss 8.3130 LearningRate 0.0779 Epoch: 9 Global Step: 47540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:38,005-Speed 3459.89 samples/sec Loss 8.3140 LearningRate 0.0779 Epoch: 9 Global Step: 47550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:40,986-Speed 3436.34 samples/sec Loss 8.4145 LearningRate 0.0778 Epoch: 9 Global Step: 47560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:43,971-Speed 3431.44 samples/sec Loss 8.2471 LearningRate 0.0778 Epoch: 9 Global Step: 47570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:46,951-Speed 3437.46 samples/sec Loss 8.2496 LearningRate 0.0778 Epoch: 9 Global Step: 47580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:49,941-Speed 3425.77 samples/sec Loss 8.1676 LearningRate 0.0778 Epoch: 9 Global Step: 47590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:50:52,874-Speed 3492.52 samples/sec Loss 8.2803 LearningRate 0.0778 Epoch: 9 Global Step: 47600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:50:55,806-Speed 3493.88 samples/sec Loss 8.3553 LearningRate 0.0777 Epoch: 9 Global Step: 47610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:50:58,743-Speed 3487.12 samples/sec Loss 8.3255 LearningRate 0.0777 Epoch: 9 Global Step: 47620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:01,708-Speed 3455.45 samples/sec Loss 8.4028 LearningRate 0.0777 Epoch: 9 Global Step: 47630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:04,709-Speed 3412.87 samples/sec Loss 8.3846 LearningRate 0.0777 Epoch: 9 Global Step: 47640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:07,747-Speed 3371.48 samples/sec Loss 8.3754 LearningRate 0.0777 Epoch: 9 Global Step: 47650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:10,707-Speed 3460.96 samples/sec Loss 8.2046 LearningRate 0.0776 Epoch: 9 Global Step: 47660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:13,641-Speed 3490.63 samples/sec Loss 8.0881 LearningRate 0.0776 Epoch: 9 Global Step: 47670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:16,572-Speed 3494.60 samples/sec Loss 8.2208 LearningRate 0.0776 Epoch: 9 Global Step: 47680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:19,509-Speed 3487.65 samples/sec Loss 8.2139 LearningRate 0.0776 Epoch: 9 Global Step: 47690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:22,521-Speed 3400.38 samples/sec Loss 8.3231 LearningRate 0.0776 Epoch: 9 Global Step: 47700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:25,469-Speed 3475.01 samples/sec Loss 8.2993 LearningRate 0.0776 Epoch: 9 Global Step: 47710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:28,441-Speed 3445.58 samples/sec Loss 8.2374 LearningRate 0.0775 Epoch: 9 Global Step: 47720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:31,514-Speed 3333.66 samples/sec Loss 8.2667 LearningRate 0.0775 Epoch: 9 Global Step: 47730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:34,459-Speed 3477.68 samples/sec Loss 8.3811 LearningRate 0.0775 Epoch: 9 Global Step: 47740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:37,427-Speed 3450.58 samples/sec Loss 8.2113 LearningRate 0.0775 Epoch: 9 Global Step: 47750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:40,428-Speed 3414.43 samples/sec Loss 8.2139 LearningRate 0.0775 Epoch: 9 Global Step: 47760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:43,362-Speed 3490.82 samples/sec Loss 8.0814 LearningRate 0.0774 Epoch: 9 Global Step: 47770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:46,303-Speed 3482.21 samples/sec Loss 8.1361 LearningRate 0.0774 Epoch: 9 Global Step: 47780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:49,243-Speed 3484.26 samples/sec Loss 8.1696 LearningRate 0.0774 Epoch: 9 Global Step: 47790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:52,182-Speed 3486.13 samples/sec Loss 8.2682 LearningRate 0.0774 Epoch: 9 Global Step: 47800 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 21:51:55,154-Speed 3446.91 samples/sec Loss 8.2446 LearningRate 0.0774 Epoch: 9 Global Step: 47810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:51:58,091-Speed 3487.54 samples/sec Loss 8.3721 LearningRate 0.0773 Epoch: 9 Global Step: 47820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:52:01,025-Speed 3491.32 samples/sec Loss 8.0823 LearningRate 0.0773 Epoch: 9 Global Step: 47830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:52:03,971-Speed 3476.29 samples/sec Loss 8.2161 LearningRate 0.0773 Epoch: 9 Global Step: 47840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:52:07,007-Speed 3373.99 samples/sec Loss 8.3394 LearningRate 0.0773 Epoch: 9 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:52:09,981-Speed 3443.52 samples/sec Loss 8.1253 LearningRate 0.0773 Epoch: 9 Global Step: 47860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:52:12,983-Speed 3411.54 samples/sec Loss 8.4075 LearningRate 0.0772 Epoch: 9 Global Step: 47870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:52:15,943-Speed 3461.35 samples/sec Loss 8.1926 LearningRate 0.0772 Epoch: 9 Global Step: 47880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:52:18,929-Speed 3429.60 samples/sec Loss 8.1662 LearningRate 0.0772 Epoch: 9 Global Step: 47890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:52:21,901-Speed 3447.75 samples/sec Loss 8.1824 LearningRate 0.0772 Epoch: 9 Global Step: 47900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:52:24,848-Speed 3475.36 samples/sec Loss 8.0987 LearningRate 0.0772 Epoch: 9 Global Step: 47910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:52:27,786-Speed 3485.53 samples/sec Loss 8.1938 LearningRate 0.0771 Epoch: 9 Global Step: 47920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:52:30,731-Speed 3477.93 samples/sec Loss 8.3685 LearningRate 0.0771 Epoch: 9 Global Step: 47930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:52:33,688-Speed 3464.16 samples/sec Loss 8.2786 LearningRate 0.0771 Epoch: 9 Global Step: 47940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:52:36,646-Speed 3463.39 samples/sec Loss 8.1577 LearningRate 0.0771 Epoch: 9 Global Step: 47950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:52:39,607-Speed 3458.14 samples/sec Loss 8.4669 LearningRate 0.0771 Epoch: 9 Global Step: 47960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:52:42,555-Speed 3475.07 samples/sec Loss 8.0927 LearningRate 0.0770 Epoch: 9 Global Step: 47970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:52:45,504-Speed 3473.26 samples/sec Loss 8.2944 LearningRate 0.0770 Epoch: 9 Global Step: 47980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:52:48,476-Speed 3446.63 samples/sec Loss 8.2743 LearningRate 0.0770 Epoch: 9 Global Step: 47990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:52:51,480-Speed 3409.74 samples/sec Loss 8.2765 LearningRate 0.0770 Epoch: 9 Global Step: 48000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:53:34,640-[lfw][48000]XNorm: 21.917442 Training: 2022-01-19 21:53:34,640-[lfw][48000]Accuracy-Flip: 0.99767+-0.00271 Training: 2022-01-19 21:53:34,641-[lfw][48000]Accuracy-Highest: 0.99767 Training: 2022-01-19 21:54:24,445-[cfp_fp][48000]XNorm: 19.223671 Training: 2022-01-19 21:54:24,446-[cfp_fp][48000]Accuracy-Flip: 0.95714+-0.01197 Training: 2022-01-19 21:54:24,446-[cfp_fp][48000]Accuracy-Highest: 0.96257 Training: 2022-01-19 21:55:07,208-[agedb_30][48000]XNorm: 21.635850 Training: 2022-01-19 21:55:07,209-[agedb_30][48000]Accuracy-Flip: 0.97250+-0.00672 Training: 2022-01-19 21:55:07,209-[agedb_30][48000]Accuracy-Highest: 0.97350 Training: 2022-01-19 21:55:10,179-Speed 73.83 samples/sec Loss 8.4224 LearningRate 0.0770 Epoch: 9 Global Step: 48010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:55:13,124-Speed 3477.75 samples/sec Loss 8.1413 LearningRate 0.0769 Epoch: 9 Global Step: 48020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:55:16,046-Speed 3506.33 samples/sec Loss 8.2898 LearningRate 0.0769 Epoch: 9 Global Step: 48030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:55:18,975-Speed 3496.62 samples/sec Loss 8.2874 LearningRate 0.0769 Epoch: 9 Global Step: 48040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:55:21,911-Speed 3488.48 samples/sec Loss 8.1926 LearningRate 0.0769 Epoch: 9 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:55:24,845-Speed 3491.44 samples/sec Loss 8.2332 LearningRate 0.0769 Epoch: 9 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:55:27,798-Speed 3468.33 samples/sec Loss 8.0464 LearningRate 0.0768 Epoch: 9 Global Step: 48070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:55:30,756-Speed 3462.77 samples/sec Loss 8.1639 LearningRate 0.0768 Epoch: 9 Global Step: 48080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:55:33,694-Speed 3486.57 samples/sec Loss 8.3244 LearningRate 0.0768 Epoch: 9 Global Step: 48090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:55:36,681-Speed 3428.64 samples/sec Loss 8.3050 LearningRate 0.0768 Epoch: 9 Global Step: 48100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:55:39,653-Speed 3447.59 samples/sec Loss 8.2081 LearningRate 0.0768 Epoch: 9 Global Step: 48110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:55:42,593-Speed 3484.13 samples/sec Loss 8.4536 LearningRate 0.0767 Epoch: 9 Global Step: 48120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:55:45,546-Speed 3467.89 samples/sec Loss 8.0963 LearningRate 0.0767 Epoch: 9 Global Step: 48130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:55:48,488-Speed 3481.89 samples/sec Loss 8.2737 LearningRate 0.0767 Epoch: 9 Global Step: 48140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:55:51,431-Speed 3479.99 samples/sec Loss 8.3943 LearningRate 0.0767 Epoch: 9 Global Step: 48150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:55:54,377-Speed 3477.45 samples/sec Loss 8.2310 LearningRate 0.0767 Epoch: 9 Global Step: 48160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:55:57,335-Speed 3462.33 samples/sec Loss 8.1350 LearningRate 0.0766 Epoch: 9 Global Step: 48170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:00,323-Speed 3428.19 samples/sec Loss 8.3268 LearningRate 0.0766 Epoch: 9 Global Step: 48180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:03,268-Speed 3477.81 samples/sec Loss 8.2340 LearningRate 0.0766 Epoch: 9 Global Step: 48190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:06,202-Speed 3491.15 samples/sec Loss 8.2730 LearningRate 0.0766 Epoch: 9 Global Step: 48200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:09,156-Speed 3467.72 samples/sec Loss 8.1922 LearningRate 0.0766 Epoch: 9 Global Step: 48210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:12,119-Speed 3456.62 samples/sec Loss 8.2445 LearningRate 0.0765 Epoch: 9 Global Step: 48220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:15,189-Speed 3336.99 samples/sec Loss 8.1016 LearningRate 0.0765 Epoch: 9 Global Step: 48230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:18,139-Speed 3471.61 samples/sec Loss 8.3235 LearningRate 0.0765 Epoch: 9 Global Step: 48240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:21,098-Speed 3462.05 samples/sec Loss 8.1461 LearningRate 0.0765 Epoch: 9 Global Step: 48250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:24,225-Speed 3275.43 samples/sec Loss 8.0852 LearningRate 0.0765 Epoch: 9 Global Step: 48260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:27,178-Speed 3468.83 samples/sec Loss 8.0499 LearningRate 0.0765 Epoch: 9 Global Step: 48270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:30,208-Speed 3380.78 samples/sec Loss 7.9798 LearningRate 0.0764 Epoch: 9 Global Step: 48280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:33,288-Speed 3326.03 samples/sec Loss 8.1267 LearningRate 0.0764 Epoch: 9 Global Step: 48290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:36,235-Speed 3475.93 samples/sec Loss 8.1819 LearningRate 0.0764 Epoch: 9 Global Step: 48300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:39,204-Speed 3450.10 samples/sec Loss 8.4068 LearningRate 0.0764 Epoch: 9 Global Step: 48310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:42,181-Speed 3440.11 samples/sec Loss 8.2490 LearningRate 0.0764 Epoch: 9 Global Step: 48320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:45,129-Speed 3475.30 samples/sec Loss 8.1824 LearningRate 0.0763 Epoch: 9 Global Step: 48330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:48,083-Speed 3467.10 samples/sec Loss 8.1160 LearningRate 0.0763 Epoch: 9 Global Step: 48340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:51,012-Speed 3497.33 samples/sec Loss 8.2789 LearningRate 0.0763 Epoch: 9 Global Step: 48350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:56:53,954-Speed 3481.58 samples/sec Loss 8.1727 LearningRate 0.0763 Epoch: 9 Global Step: 48360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:56:57,024-Speed 3336.48 samples/sec Loss 8.2969 LearningRate 0.0763 Epoch: 9 Global Step: 48370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:00,036-Speed 3413.84 samples/sec Loss 8.0669 LearningRate 0.0762 Epoch: 9 Global Step: 48380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:02,981-Speed 3477.60 samples/sec Loss 8.1440 LearningRate 0.0762 Epoch: 9 Global Step: 48390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:05,925-Speed 3479.64 samples/sec Loss 8.2285 LearningRate 0.0762 Epoch: 9 Global Step: 48400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:08,895-Speed 3448.38 samples/sec Loss 8.1974 LearningRate 0.0762 Epoch: 9 Global Step: 48410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:11,920-Speed 3385.72 samples/sec Loss 8.1572 LearningRate 0.0762 Epoch: 9 Global Step: 48420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:14,899-Speed 3438.38 samples/sec Loss 8.0724 LearningRate 0.0761 Epoch: 9 Global Step: 48430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:17,848-Speed 3473.60 samples/sec Loss 8.2315 LearningRate 0.0761 Epoch: 9 Global Step: 48440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:20,856-Speed 3404.84 samples/sec Loss 8.1042 LearningRate 0.0761 Epoch: 9 Global Step: 48450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:23,823-Speed 3452.21 samples/sec Loss 8.2557 LearningRate 0.0761 Epoch: 9 Global Step: 48460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:57:26,767-Speed 3479.28 samples/sec Loss 8.1394 LearningRate 0.0761 Epoch: 9 Global Step: 48470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:57:29,755-Speed 3428.26 samples/sec Loss 8.1429 LearningRate 0.0760 Epoch: 9 Global Step: 48480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:32,769-Speed 3398.11 samples/sec Loss 8.2118 LearningRate 0.0760 Epoch: 9 Global Step: 48490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:35,704-Speed 3489.49 samples/sec Loss 8.0728 LearningRate 0.0760 Epoch: 9 Global Step: 48500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:38,658-Speed 3468.65 samples/sec Loss 8.2244 LearningRate 0.0760 Epoch: 9 Global Step: 48510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:41,606-Speed 3474.39 samples/sec Loss 8.2535 LearningRate 0.0760 Epoch: 9 Global Step: 48520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:44,574-Speed 3451.12 samples/sec Loss 8.2884 LearningRate 0.0759 Epoch: 9 Global Step: 48530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:47,528-Speed 3467.55 samples/sec Loss 8.2913 LearningRate 0.0759 Epoch: 9 Global Step: 48540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:50,480-Speed 3468.75 samples/sec Loss 8.1524 LearningRate 0.0759 Epoch: 9 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:53,437-Speed 3465.25 samples/sec Loss 8.2844 LearningRate 0.0759 Epoch: 9 Global Step: 48560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:56,412-Speed 3442.71 samples/sec Loss 8.2066 LearningRate 0.0759 Epoch: 9 Global Step: 48570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:57:59,360-Speed 3473.89 samples/sec Loss 8.2898 LearningRate 0.0758 Epoch: 9 Global Step: 48580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:02,378-Speed 3394.86 samples/sec Loss 8.1095 LearningRate 0.0758 Epoch: 9 Global Step: 48590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:05,360-Speed 3434.21 samples/sec Loss 8.1870 LearningRate 0.0758 Epoch: 9 Global Step: 48600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:08,344-Speed 3432.74 samples/sec Loss 7.9660 LearningRate 0.0758 Epoch: 9 Global Step: 48610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:11,299-Speed 3466.18 samples/sec Loss 8.0972 LearningRate 0.0758 Epoch: 9 Global Step: 48620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:14,363-Speed 3343.16 samples/sec Loss 8.1424 LearningRate 0.0757 Epoch: 9 Global Step: 48630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:17,354-Speed 3423.94 samples/sec Loss 8.2071 LearningRate 0.0757 Epoch: 9 Global Step: 48640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:20,329-Speed 3445.09 samples/sec Loss 8.1455 LearningRate 0.0757 Epoch: 9 Global Step: 48650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:23,295-Speed 3452.85 samples/sec Loss 8.1326 LearningRate 0.0757 Epoch: 9 Global Step: 48660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:26,242-Speed 3475.66 samples/sec Loss 8.1460 LearningRate 0.0757 Epoch: 9 Global Step: 48670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:29,165-Speed 3504.77 samples/sec Loss 8.2226 LearningRate 0.0757 Epoch: 9 Global Step: 48680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:32,097-Speed 3492.81 samples/sec Loss 8.0595 LearningRate 0.0756 Epoch: 9 Global Step: 48690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:35,034-Speed 3487.53 samples/sec Loss 8.1404 LearningRate 0.0756 Epoch: 9 Global Step: 48700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:37,975-Speed 3482.60 samples/sec Loss 8.1267 LearningRate 0.0756 Epoch: 9 Global Step: 48710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:40,912-Speed 3488.25 samples/sec Loss 8.1323 LearningRate 0.0756 Epoch: 9 Global Step: 48720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:43,872-Speed 3460.44 samples/sec Loss 8.2490 LearningRate 0.0756 Epoch: 9 Global Step: 48730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:46,901-Speed 3381.00 samples/sec Loss 8.1057 LearningRate 0.0755 Epoch: 9 Global Step: 48740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:49,898-Speed 3418.19 samples/sec Loss 8.1662 LearningRate 0.0755 Epoch: 9 Global Step: 48750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:58:52,821-Speed 3503.74 samples/sec Loss 8.1428 LearningRate 0.0755 Epoch: 9 Global Step: 48760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:58:55,791-Speed 3448.50 samples/sec Loss 8.1467 LearningRate 0.0755 Epoch: 9 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:58:58,751-Speed 3461.62 samples/sec Loss 8.2531 LearningRate 0.0755 Epoch: 9 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:59:01,695-Speed 3479.15 samples/sec Loss 8.1967 LearningRate 0.0754 Epoch: 9 Global Step: 48790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:59:04,673-Speed 3440.32 samples/sec Loss 8.1854 LearningRate 0.0754 Epoch: 9 Global Step: 48800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:59:07,627-Speed 3466.94 samples/sec Loss 8.2564 LearningRate 0.0754 Epoch: 9 Global Step: 48810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:59:10,626-Speed 3414.50 samples/sec Loss 8.1340 LearningRate 0.0754 Epoch: 9 Global Step: 48820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:59:13,574-Speed 3475.41 samples/sec Loss 8.3515 LearningRate 0.0754 Epoch: 9 Global Step: 48830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:59:16,520-Speed 3475.78 samples/sec Loss 8.0546 LearningRate 0.0753 Epoch: 9 Global Step: 48840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:59:19,523-Speed 3411.34 samples/sec Loss 8.2604 LearningRate 0.0753 Epoch: 9 Global Step: 48850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 21:59:22,465-Speed 3481.71 samples/sec Loss 8.0426 LearningRate 0.0753 Epoch: 9 Global Step: 48860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:59:25,404-Speed 3485.66 samples/sec Loss 8.0421 LearningRate 0.0753 Epoch: 9 Global Step: 48870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:59:28,348-Speed 3479.55 samples/sec Loss 8.0868 LearningRate 0.0753 Epoch: 9 Global Step: 48880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:59:31,295-Speed 3474.85 samples/sec Loss 8.1329 LearningRate 0.0752 Epoch: 9 Global Step: 48890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:59:34,311-Speed 3397.12 samples/sec Loss 8.1119 LearningRate 0.0752 Epoch: 9 Global Step: 48900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:59:37,297-Speed 3429.22 samples/sec Loss 8.0803 LearningRate 0.0752 Epoch: 9 Global Step: 48910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:59:40,273-Speed 3442.66 samples/sec Loss 8.0418 LearningRate 0.0752 Epoch: 9 Global Step: 48920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:59:43,224-Speed 3470.89 samples/sec Loss 8.2150 LearningRate 0.0752 Epoch: 9 Global Step: 48930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 21:59:46,166-Speed 3481.07 samples/sec Loss 8.0856 LearningRate 0.0751 Epoch: 9 Global Step: 48940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:59:49,146-Speed 3437.06 samples/sec Loss 8.2304 LearningRate 0.0751 Epoch: 9 Global Step: 48950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:59:52,146-Speed 3415.21 samples/sec Loss 8.1913 LearningRate 0.0751 Epoch: 9 Global Step: 48960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:59:55,082-Speed 3488.23 samples/sec Loss 8.2772 LearningRate 0.0751 Epoch: 9 Global Step: 48970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 21:59:58,076-Speed 3421.58 samples/sec Loss 8.1751 LearningRate 0.0751 Epoch: 9 Global Step: 48980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:00:01,099-Speed 3387.41 samples/sec Loss 8.1694 LearningRate 0.0750 Epoch: 9 Global Step: 48990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:00:04,036-Speed 3487.62 samples/sec Loss 8.1017 LearningRate 0.0750 Epoch: 9 Global Step: 49000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:00:07,048-Speed 3400.84 samples/sec Loss 8.1578 LearningRate 0.0750 Epoch: 9 Global Step: 49010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:00:10,063-Speed 3398.01 samples/sec Loss 8.3209 LearningRate 0.0750 Epoch: 9 Global Step: 49020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:00:13,106-Speed 3365.75 samples/sec Loss 8.1109 LearningRate 0.0750 Epoch: 9 Global Step: 49030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:00:16,083-Speed 3439.65 samples/sec Loss 7.9984 LearningRate 0.0750 Epoch: 9 Global Step: 49040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:00:19,027-Speed 3479.92 samples/sec Loss 8.0390 LearningRate 0.0749 Epoch: 9 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:00:21,982-Speed 3466.21 samples/sec Loss 7.9476 LearningRate 0.0749 Epoch: 9 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:00:24,929-Speed 3475.86 samples/sec Loss 8.2385 LearningRate 0.0749 Epoch: 9 Global Step: 49070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:00:27,884-Speed 3466.46 samples/sec Loss 8.2868 LearningRate 0.0749 Epoch: 9 Global Step: 49080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:00:30,904-Speed 3390.96 samples/sec Loss 8.0825 LearningRate 0.0749 Epoch: 9 Global Step: 49090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:00:33,874-Speed 3449.51 samples/sec Loss 8.1360 LearningRate 0.0748 Epoch: 9 Global Step: 49100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:00:36,817-Speed 3480.31 samples/sec Loss 8.1409 LearningRate 0.0748 Epoch: 9 Global Step: 49110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:00:39,782-Speed 3454.06 samples/sec Loss 8.1006 LearningRate 0.0748 Epoch: 9 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:00:42,750-Speed 3451.86 samples/sec Loss 8.0864 LearningRate 0.0748 Epoch: 9 Global Step: 49130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:00:45,698-Speed 3473.94 samples/sec Loss 8.0261 LearningRate 0.0748 Epoch: 9 Global Step: 49140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:00:48,668-Speed 3448.83 samples/sec Loss 8.1980 LearningRate 0.0747 Epoch: 9 Global Step: 49150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:00:51,648-Speed 3436.88 samples/sec Loss 7.9459 LearningRate 0.0747 Epoch: 9 Global Step: 49160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:00:54,578-Speed 3496.43 samples/sec Loss 8.2476 LearningRate 0.0747 Epoch: 9 Global Step: 49170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:00:57,519-Speed 3482.32 samples/sec Loss 8.0923 LearningRate 0.0747 Epoch: 9 Global Step: 49180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:01:00,460-Speed 3483.38 samples/sec Loss 7.9836 LearningRate 0.0747 Epoch: 9 Global Step: 49190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:01:03,418-Speed 3462.22 samples/sec Loss 8.1248 LearningRate 0.0746 Epoch: 9 Global Step: 49200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:01:06,391-Speed 3445.68 samples/sec Loss 8.1758 LearningRate 0.0746 Epoch: 9 Global Step: 49210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:01:09,348-Speed 3463.84 samples/sec Loss 8.2132 LearningRate 0.0746 Epoch: 9 Global Step: 49220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:01:12,326-Speed 3439.11 samples/sec Loss 8.2349 LearningRate 0.0746 Epoch: 9 Global Step: 49230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:01:15,307-Speed 3436.18 samples/sec Loss 8.0509 LearningRate 0.0746 Epoch: 9 Global Step: 49240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:18,306-Speed 3415.60 samples/sec Loss 7.9696 LearningRate 0.0745 Epoch: 9 Global Step: 49250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:21,263-Speed 3464.01 samples/sec Loss 8.0985 LearningRate 0.0745 Epoch: 9 Global Step: 49260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:24,219-Speed 3464.75 samples/sec Loss 8.0596 LearningRate 0.0745 Epoch: 9 Global Step: 49270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:27,185-Speed 3453.40 samples/sec Loss 8.0227 LearningRate 0.0745 Epoch: 9 Global Step: 49280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:30,125-Speed 3484.53 samples/sec Loss 8.0394 LearningRate 0.0745 Epoch: 9 Global Step: 49290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:33,059-Speed 3490.18 samples/sec Loss 8.1528 LearningRate 0.0744 Epoch: 9 Global Step: 49300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:36,032-Speed 3444.86 samples/sec Loss 8.1155 LearningRate 0.0744 Epoch: 9 Global Step: 49310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:39,029-Speed 3418.59 samples/sec Loss 8.1500 LearningRate 0.0744 Epoch: 9 Global Step: 49320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:42,047-Speed 3394.19 samples/sec Loss 8.0285 LearningRate 0.0744 Epoch: 9 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:44,972-Speed 3501.70 samples/sec Loss 7.9648 LearningRate 0.0744 Epoch: 9 Global Step: 49340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:47,907-Speed 3490.47 samples/sec Loss 8.1131 LearningRate 0.0744 Epoch: 9 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:50,880-Speed 3444.73 samples/sec Loss 7.9687 LearningRate 0.0743 Epoch: 9 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:53,809-Speed 3497.09 samples/sec Loss 8.0701 LearningRate 0.0743 Epoch: 9 Global Step: 49370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:56,790-Speed 3435.75 samples/sec Loss 8.0394 LearningRate 0.0743 Epoch: 9 Global Step: 49380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:01:59,794-Speed 3409.93 samples/sec Loss 8.1460 LearningRate 0.0743 Epoch: 9 Global Step: 49390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:02:02,774-Speed 3436.50 samples/sec Loss 8.1467 LearningRate 0.0743 Epoch: 9 Global Step: 49400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:02:05,775-Speed 3413.65 samples/sec Loss 8.1186 LearningRate 0.0742 Epoch: 9 Global Step: 49410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:02:08,723-Speed 3474.98 samples/sec Loss 7.9329 LearningRate 0.0742 Epoch: 9 Global Step: 49420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:02:11,678-Speed 3466.14 samples/sec Loss 8.0034 LearningRate 0.0742 Epoch: 9 Global Step: 49430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:02:14,622-Speed 3479.70 samples/sec Loss 8.1384 LearningRate 0.0742 Epoch: 9 Global Step: 49440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:17,575-Speed 3467.86 samples/sec Loss 8.2196 LearningRate 0.0742 Epoch: 9 Global Step: 49450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:20,554-Speed 3438.33 samples/sec Loss 8.2796 LearningRate 0.0741 Epoch: 9 Global Step: 49460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:23,517-Speed 3456.86 samples/sec Loss 8.1747 LearningRate 0.0741 Epoch: 9 Global Step: 49470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:26,486-Speed 3450.19 samples/sec Loss 8.1294 LearningRate 0.0741 Epoch: 9 Global Step: 49480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:29,480-Speed 3421.42 samples/sec Loss 8.0114 LearningRate 0.0741 Epoch: 9 Global Step: 49490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:32,439-Speed 3460.54 samples/sec Loss 8.1222 LearningRate 0.0741 Epoch: 9 Global Step: 49500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:35,402-Speed 3457.10 samples/sec Loss 8.0768 LearningRate 0.0740 Epoch: 9 Global Step: 49510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:38,346-Speed 3479.55 samples/sec Loss 7.9490 LearningRate 0.0740 Epoch: 9 Global Step: 49520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:41,344-Speed 3416.87 samples/sec Loss 8.1378 LearningRate 0.0740 Epoch: 9 Global Step: 49530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:44,333-Speed 3426.85 samples/sec Loss 8.2584 LearningRate 0.0740 Epoch: 9 Global Step: 49540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:47,275-Speed 3480.75 samples/sec Loss 8.1983 LearningRate 0.0740 Epoch: 9 Global Step: 49550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:50,217-Speed 3481.93 samples/sec Loss 8.0260 LearningRate 0.0739 Epoch: 9 Global Step: 49560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:53,165-Speed 3474.40 samples/sec Loss 8.0962 LearningRate 0.0739 Epoch: 9 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:56,146-Speed 3436.16 samples/sec Loss 8.1189 LearningRate 0.0739 Epoch: 9 Global Step: 49580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:02:59,114-Speed 3451.62 samples/sec Loss 8.1516 LearningRate 0.0739 Epoch: 9 Global Step: 49590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:02,089-Speed 3442.57 samples/sec Loss 8.1270 LearningRate 0.0739 Epoch: 9 Global Step: 49600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:05,033-Speed 3478.81 samples/sec Loss 8.0452 LearningRate 0.0739 Epoch: 9 Global Step: 49610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:08,003-Speed 3448.77 samples/sec Loss 8.2022 LearningRate 0.0738 Epoch: 9 Global Step: 49620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:10,958-Speed 3466.67 samples/sec Loss 8.0668 LearningRate 0.0738 Epoch: 9 Global Step: 49630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:13,907-Speed 3472.99 samples/sec Loss 8.0578 LearningRate 0.0738 Epoch: 9 Global Step: 49640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:16,902-Speed 3420.24 samples/sec Loss 8.0790 LearningRate 0.0738 Epoch: 9 Global Step: 49650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:19,838-Speed 3489.11 samples/sec Loss 8.0441 LearningRate 0.0738 Epoch: 9 Global Step: 49660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:22,789-Speed 3470.81 samples/sec Loss 8.1518 LearningRate 0.0737 Epoch: 9 Global Step: 49670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:25,799-Speed 3402.26 samples/sec Loss 8.1505 LearningRate 0.0737 Epoch: 9 Global Step: 49680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:28,803-Speed 3410.26 samples/sec Loss 8.1955 LearningRate 0.0737 Epoch: 9 Global Step: 49690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:31,852-Speed 3359.39 samples/sec Loss 8.0617 LearningRate 0.0737 Epoch: 9 Global Step: 49700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:34,820-Speed 3450.86 samples/sec Loss 8.0992 LearningRate 0.0737 Epoch: 9 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:37,795-Speed 3442.76 samples/sec Loss 8.1718 LearningRate 0.0736 Epoch: 9 Global Step: 49720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:40,802-Speed 3407.34 samples/sec Loss 8.0916 LearningRate 0.0736 Epoch: 9 Global Step: 49730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:43,738-Speed 3488.58 samples/sec Loss 7.9222 LearningRate 0.0736 Epoch: 9 Global Step: 49740 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 22:03:46,668-Speed 3495.28 samples/sec Loss 8.0477 LearningRate 0.0736 Epoch: 9 Global Step: 49750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:49,669-Speed 3413.46 samples/sec Loss 8.0385 LearningRate 0.0736 Epoch: 9 Global Step: 49760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:52,777-Speed 3295.57 samples/sec Loss 8.1154 LearningRate 0.0735 Epoch: 9 Global Step: 49770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:55,788-Speed 3401.29 samples/sec Loss 8.1326 LearningRate 0.0735 Epoch: 9 Global Step: 49780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:03:58,739-Speed 3471.22 samples/sec Loss 8.1753 LearningRate 0.0735 Epoch: 9 Global Step: 49790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:04:01,728-Speed 3426.19 samples/sec Loss 8.2099 LearningRate 0.0735 Epoch: 9 Global Step: 49800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:04:04,688-Speed 3460.93 samples/sec Loss 8.0333 LearningRate 0.0735 Epoch: 9 Global Step: 49810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:04:07,651-Speed 3456.49 samples/sec Loss 7.8538 LearningRate 0.0734 Epoch: 9 Global Step: 49820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:04:10,604-Speed 3469.41 samples/sec Loss 8.0995 LearningRate 0.0734 Epoch: 9 Global Step: 49830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:04:13,554-Speed 3471.27 samples/sec Loss 7.9664 LearningRate 0.0734 Epoch: 9 Global Step: 49840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:04:16,482-Speed 3498.22 samples/sec Loss 8.1801 LearningRate 0.0734 Epoch: 9 Global Step: 49850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:04:19,443-Speed 3459.77 samples/sec Loss 8.1583 LearningRate 0.0734 Epoch: 9 Global Step: 49860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:04:22,475-Speed 3378.57 samples/sec Loss 8.1521 LearningRate 0.0734 Epoch: 9 Global Step: 49870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:04:25,462-Speed 3429.35 samples/sec Loss 8.1844 LearningRate 0.0733 Epoch: 9 Global Step: 49880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:04:28,436-Speed 3443.82 samples/sec Loss 8.1068 LearningRate 0.0733 Epoch: 9 Global Step: 49890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:04:31,391-Speed 3466.63 samples/sec Loss 8.2024 LearningRate 0.0733 Epoch: 9 Global Step: 49900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:04:34,352-Speed 3459.28 samples/sec Loss 8.0557 LearningRate 0.0733 Epoch: 9 Global Step: 49910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:04:37,296-Speed 3479.34 samples/sec Loss 8.0676 LearningRate 0.0733 Epoch: 9 Global Step: 49920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:04:40,260-Speed 3456.22 samples/sec Loss 8.1622 LearningRate 0.0732 Epoch: 9 Global Step: 49930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:04:43,234-Speed 3443.65 samples/sec Loss 7.9401 LearningRate 0.0732 Epoch: 9 Global Step: 49940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:04:46,200-Speed 3453.08 samples/sec Loss 7.8907 LearningRate 0.0732 Epoch: 9 Global Step: 49950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:04:49,293-Speed 3312.29 samples/sec Loss 8.0144 LearningRate 0.0732 Epoch: 9 Global Step: 49960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:04:52,277-Speed 3432.80 samples/sec Loss 8.1118 LearningRate 0.0732 Epoch: 9 Global Step: 49970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:04:55,233-Speed 3465.10 samples/sec Loss 8.0755 LearningRate 0.0731 Epoch: 9 Global Step: 49980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:04:58,192-Speed 3461.99 samples/sec Loss 7.8440 LearningRate 0.0731 Epoch: 9 Global Step: 49990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:05:01,145-Speed 3468.06 samples/sec Loss 8.1081 LearningRate 0.0731 Epoch: 9 Global Step: 50000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:05:44,126-[lfw][50000]XNorm: 22.961925 Training: 2022-01-19 22:05:44,127-[lfw][50000]Accuracy-Flip: 0.99717+-0.00248 Training: 2022-01-19 22:05:44,127-[lfw][50000]Accuracy-Highest: 0.99767 Training: 2022-01-19 22:06:33,777-[cfp_fp][50000]XNorm: 20.215942 Training: 2022-01-19 22:06:33,778-[cfp_fp][50000]Accuracy-Flip: 0.95857+-0.00860 Training: 2022-01-19 22:06:33,778-[cfp_fp][50000]Accuracy-Highest: 0.96257 Training: 2022-01-19 22:07:16,763-[agedb_30][50000]XNorm: 22.532144 Training: 2022-01-19 22:07:16,764-[agedb_30][50000]Accuracy-Flip: 0.97117+-0.00646 Training: 2022-01-19 22:07:16,764-[agedb_30][50000]Accuracy-Highest: 0.97350 Training: 2022-01-19 22:07:19,715-Speed 73.90 samples/sec Loss 8.0039 LearningRate 0.0731 Epoch: 9 Global Step: 50010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:07:22,672-Speed 3464.59 samples/sec Loss 7.9906 LearningRate 0.0731 Epoch: 9 Global Step: 50020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:07:25,618-Speed 3476.97 samples/sec Loss 8.0631 LearningRate 0.0730 Epoch: 9 Global Step: 50030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:07:28,565-Speed 3475.72 samples/sec Loss 8.0604 LearningRate 0.0730 Epoch: 9 Global Step: 50040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:07:31,500-Speed 3489.42 samples/sec Loss 7.9580 LearningRate 0.0730 Epoch: 9 Global Step: 50050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:07:34,450-Speed 3472.11 samples/sec Loss 8.0455 LearningRate 0.0730 Epoch: 9 Global Step: 50060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:07:37,406-Speed 3464.83 samples/sec Loss 7.9664 LearningRate 0.0730 Epoch: 9 Global Step: 50070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:07:40,351-Speed 3478.01 samples/sec Loss 7.9874 LearningRate 0.0730 Epoch: 9 Global Step: 50080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:07:43,325-Speed 3445.03 samples/sec Loss 8.0541 LearningRate 0.0729 Epoch: 9 Global Step: 50090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:07:46,320-Speed 3419.54 samples/sec Loss 7.8293 LearningRate 0.0729 Epoch: 9 Global Step: 50100 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 22:07:49,265-Speed 3477.87 samples/sec Loss 8.0180 LearningRate 0.0729 Epoch: 9 Global Step: 50110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:07:52,213-Speed 3475.14 samples/sec Loss 8.0578 LearningRate 0.0729 Epoch: 9 Global Step: 50120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:07:55,215-Speed 3412.07 samples/sec Loss 7.7682 LearningRate 0.0729 Epoch: 9 Global Step: 50130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:07:58,203-Speed 3427.50 samples/sec Loss 8.1427 LearningRate 0.0728 Epoch: 9 Global Step: 50140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:08:01,297-Speed 3310.22 samples/sec Loss 8.1138 LearningRate 0.0728 Epoch: 9 Global Step: 50150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:08:04,356-Speed 3348.87 samples/sec Loss 8.1419 LearningRate 0.0728 Epoch: 9 Global Step: 50160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:08:07,381-Speed 3385.29 samples/sec Loss 8.0510 LearningRate 0.0728 Epoch: 9 Global Step: 50170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:08:10,338-Speed 3464.32 samples/sec Loss 8.1374 LearningRate 0.0728 Epoch: 9 Global Step: 50180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:08:13,352-Speed 3398.07 samples/sec Loss 8.1401 LearningRate 0.0727 Epoch: 9 Global Step: 50190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:08:16,321-Speed 3450.35 samples/sec Loss 8.1995 LearningRate 0.0727 Epoch: 9 Global Step: 50200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:08:19,322-Speed 3413.47 samples/sec Loss 8.1753 LearningRate 0.0727 Epoch: 9 Global Step: 50210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:08:22,294-Speed 3446.87 samples/sec Loss 7.9467 LearningRate 0.0727 Epoch: 9 Global Step: 50220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:08:25,279-Speed 3430.27 samples/sec Loss 7.9424 LearningRate 0.0727 Epoch: 9 Global Step: 50230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:08:28,237-Speed 3463.59 samples/sec Loss 8.2244 LearningRate 0.0726 Epoch: 9 Global Step: 50240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:08:31,180-Speed 3479.87 samples/sec Loss 8.1442 LearningRate 0.0726 Epoch: 9 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:08:34,137-Speed 3465.01 samples/sec Loss 8.0541 LearningRate 0.0726 Epoch: 9 Global Step: 50260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:08:37,138-Speed 3413.00 samples/sec Loss 7.8394 LearningRate 0.0726 Epoch: 9 Global Step: 50270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:08:40,219-Speed 3324.31 samples/sec Loss 8.0688 LearningRate 0.0726 Epoch: 9 Global Step: 50280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:08:43,159-Speed 3483.94 samples/sec Loss 7.9239 LearningRate 0.0726 Epoch: 9 Global Step: 50290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:08:46,131-Speed 3446.33 samples/sec Loss 8.1704 LearningRate 0.0725 Epoch: 9 Global Step: 50300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:08:49,088-Speed 3464.09 samples/sec Loss 8.1719 LearningRate 0.0725 Epoch: 9 Global Step: 50310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:08:52,034-Speed 3477.21 samples/sec Loss 8.0124 LearningRate 0.0725 Epoch: 9 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:08:55,011-Speed 3439.81 samples/sec Loss 8.1401 LearningRate 0.0725 Epoch: 9 Global Step: 50330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:08:57,966-Speed 3466.12 samples/sec Loss 8.0383 LearningRate 0.0725 Epoch: 9 Global Step: 50340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:09:00,955-Speed 3426.98 samples/sec Loss 7.9967 LearningRate 0.0724 Epoch: 9 Global Step: 50350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:09:03,915-Speed 3461.09 samples/sec Loss 8.0693 LearningRate 0.0724 Epoch: 9 Global Step: 50360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:09:06,867-Speed 3468.81 samples/sec Loss 8.0516 LearningRate 0.0724 Epoch: 9 Global Step: 50370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:09:09,819-Speed 3470.17 samples/sec Loss 7.9391 LearningRate 0.0724 Epoch: 9 Global Step: 50380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:09:12,832-Speed 3399.36 samples/sec Loss 8.1990 LearningRate 0.0724 Epoch: 9 Global Step: 50390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:09:15,797-Speed 3454.20 samples/sec Loss 8.0749 LearningRate 0.0723 Epoch: 9 Global Step: 50400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:09:18,763-Speed 3454.59 samples/sec Loss 7.9024 LearningRate 0.0723 Epoch: 9 Global Step: 50410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:09:21,727-Speed 3454.91 samples/sec Loss 8.0586 LearningRate 0.0723 Epoch: 9 Global Step: 50420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:09:24,714-Speed 3430.29 samples/sec Loss 7.9995 LearningRate 0.0723 Epoch: 9 Global Step: 50430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:09:27,685-Speed 3447.18 samples/sec Loss 8.1358 LearningRate 0.0723 Epoch: 9 Global Step: 50440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:09:30,719-Speed 3375.45 samples/sec Loss 8.0139 LearningRate 0.0722 Epoch: 9 Global Step: 50450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:09:33,758-Speed 3371.21 samples/sec Loss 7.9345 LearningRate 0.0722 Epoch: 9 Global Step: 50460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:09:36,766-Speed 3404.09 samples/sec Loss 8.1064 LearningRate 0.0722 Epoch: 9 Global Step: 50470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:09:39,701-Speed 3490.64 samples/sec Loss 7.9316 LearningRate 0.0722 Epoch: 9 Global Step: 50480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:09:42,637-Speed 3489.13 samples/sec Loss 7.8710 LearningRate 0.0722 Epoch: 9 Global Step: 50490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:09:45,621-Speed 3432.99 samples/sec Loss 8.0335 LearningRate 0.0722 Epoch: 9 Global Step: 50500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:09:48,689-Speed 3337.96 samples/sec Loss 7.9680 LearningRate 0.0721 Epoch: 9 Global Step: 50510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:09:51,711-Speed 3389.38 samples/sec Loss 8.0728 LearningRate 0.0721 Epoch: 9 Global Step: 50520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:09:54,735-Speed 3387.69 samples/sec Loss 7.9478 LearningRate 0.0721 Epoch: 9 Global Step: 50530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:09:57,731-Speed 3418.36 samples/sec Loss 7.8421 LearningRate 0.0721 Epoch: 9 Global Step: 50540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:10:00,782-Speed 3358.07 samples/sec Loss 7.9172 LearningRate 0.0721 Epoch: 9 Global Step: 50550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:10:03,784-Speed 3411.43 samples/sec Loss 8.0505 LearningRate 0.0720 Epoch: 9 Global Step: 50560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:10:06,812-Speed 3383.03 samples/sec Loss 7.9998 LearningRate 0.0720 Epoch: 9 Global Step: 50570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:10:09,752-Speed 3484.04 samples/sec Loss 8.0657 LearningRate 0.0720 Epoch: 9 Global Step: 50580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:10:21,845-Speed 846.85 samples/sec Loss 7.1257 LearningRate 0.0720 Epoch: 10 Global Step: 50590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:10:25,010-Speed 3238.64 samples/sec Loss 7.2520 LearningRate 0.0720 Epoch: 10 Global Step: 50600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:10:27,955-Speed 3477.90 samples/sec Loss 7.2658 LearningRate 0.0719 Epoch: 10 Global Step: 50610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:10:30,978-Speed 3388.66 samples/sec Loss 7.3099 LearningRate 0.0719 Epoch: 10 Global Step: 50620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:10:33,945-Speed 3451.73 samples/sec Loss 7.1723 LearningRate 0.0719 Epoch: 10 Global Step: 50630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:10:36,871-Speed 3501.34 samples/sec Loss 7.1997 LearningRate 0.0719 Epoch: 10 Global Step: 50640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:10:39,804-Speed 3492.43 samples/sec Loss 7.2788 LearningRate 0.0719 Epoch: 10 Global Step: 50650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:10:42,756-Speed 3469.17 samples/sec Loss 7.2640 LearningRate 0.0718 Epoch: 10 Global Step: 50660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:10:45,700-Speed 3479.10 samples/sec Loss 7.3617 LearningRate 0.0718 Epoch: 10 Global Step: 50670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:10:48,643-Speed 3480.39 samples/sec Loss 7.3469 LearningRate 0.0718 Epoch: 10 Global Step: 50680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:10:51,591-Speed 3474.96 samples/sec Loss 7.3011 LearningRate 0.0718 Epoch: 10 Global Step: 50690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:10:54,558-Speed 3452.66 samples/sec Loss 7.2052 LearningRate 0.0718 Epoch: 10 Global Step: 50700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:10:57,511-Speed 3468.54 samples/sec Loss 7.2956 LearningRate 0.0718 Epoch: 10 Global Step: 50710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:11:00,531-Speed 3391.29 samples/sec Loss 7.4715 LearningRate 0.0717 Epoch: 10 Global Step: 50720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:11:03,482-Speed 3471.61 samples/sec Loss 7.3875 LearningRate 0.0717 Epoch: 10 Global Step: 50730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:11:06,434-Speed 3469.91 samples/sec Loss 7.4684 LearningRate 0.0717 Epoch: 10 Global Step: 50740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:11:09,423-Speed 3426.57 samples/sec Loss 7.4375 LearningRate 0.0717 Epoch: 10 Global Step: 50750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:11:12,444-Speed 3390.92 samples/sec Loss 7.3442 LearningRate 0.0717 Epoch: 10 Global Step: 50760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:11:15,387-Speed 3480.11 samples/sec Loss 7.3571 LearningRate 0.0716 Epoch: 10 Global Step: 50770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:11:18,349-Speed 3458.11 samples/sec Loss 7.3391 LearningRate 0.0716 Epoch: 10 Global Step: 50780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:11:21,357-Speed 3404.56 samples/sec Loss 7.4829 LearningRate 0.0716 Epoch: 10 Global Step: 50790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:11:24,309-Speed 3470.95 samples/sec Loss 7.4987 LearningRate 0.0716 Epoch: 10 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:11:27,376-Speed 3339.11 samples/sec Loss 7.5669 LearningRate 0.0716 Epoch: 10 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:11:30,414-Speed 3371.08 samples/sec Loss 7.6416 LearningRate 0.0715 Epoch: 10 Global Step: 50820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:11:33,357-Speed 3480.65 samples/sec Loss 7.4805 LearningRate 0.0715 Epoch: 10 Global Step: 50830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:11:36,303-Speed 3477.78 samples/sec Loss 7.4252 LearningRate 0.0715 Epoch: 10 Global Step: 50840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:11:39,383-Speed 3325.12 samples/sec Loss 7.4165 LearningRate 0.0715 Epoch: 10 Global Step: 50850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:11:42,329-Speed 3476.73 samples/sec Loss 7.3842 LearningRate 0.0715 Epoch: 10 Global Step: 50860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:11:45,333-Speed 3410.30 samples/sec Loss 7.5264 LearningRate 0.0715 Epoch: 10 Global Step: 50870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:11:48,287-Speed 3467.22 samples/sec Loss 7.5892 LearningRate 0.0714 Epoch: 10 Global Step: 50880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:11:51,253-Speed 3452.78 samples/sec Loss 7.5745 LearningRate 0.0714 Epoch: 10 Global Step: 50890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:11:54,214-Speed 3460.59 samples/sec Loss 7.6046 LearningRate 0.0714 Epoch: 10 Global Step: 50900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:11:57,171-Speed 3464.01 samples/sec Loss 7.4018 LearningRate 0.0714 Epoch: 10 Global Step: 50910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:12:00,135-Speed 3455.66 samples/sec Loss 7.4788 LearningRate 0.0714 Epoch: 10 Global Step: 50920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:12:03,086-Speed 3471.63 samples/sec Loss 7.5020 LearningRate 0.0713 Epoch: 10 Global Step: 50930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:12:06,037-Speed 3470.22 samples/sec Loss 7.6619 LearningRate 0.0713 Epoch: 10 Global Step: 50940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:12:09,010-Speed 3445.35 samples/sec Loss 7.5844 LearningRate 0.0713 Epoch: 10 Global Step: 50950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:12:11,982-Speed 3446.65 samples/sec Loss 7.5923 LearningRate 0.0713 Epoch: 10 Global Step: 50960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:12:14,930-Speed 3474.87 samples/sec Loss 7.4049 LearningRate 0.0713 Epoch: 10 Global Step: 50970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:12:17,893-Speed 3456.47 samples/sec Loss 7.5031 LearningRate 0.0712 Epoch: 10 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:12:20,851-Speed 3462.80 samples/sec Loss 7.7354 LearningRate 0.0712 Epoch: 10 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:12:23,847-Speed 3418.59 samples/sec Loss 7.6002 LearningRate 0.0712 Epoch: 10 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:12:26,809-Speed 3458.42 samples/sec Loss 7.8623 LearningRate 0.0712 Epoch: 10 Global Step: 51010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:12:29,788-Speed 3438.99 samples/sec Loss 7.7377 LearningRate 0.0712 Epoch: 10 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:12:32,776-Speed 3426.72 samples/sec Loss 7.7368 LearningRate 0.0711 Epoch: 10 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:12:35,738-Speed 3458.15 samples/sec Loss 7.6660 LearningRate 0.0711 Epoch: 10 Global Step: 51040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:12:38,824-Speed 3319.91 samples/sec Loss 7.6151 LearningRate 0.0711 Epoch: 10 Global Step: 51050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:12:41,929-Speed 3298.13 samples/sec Loss 7.8114 LearningRate 0.0711 Epoch: 10 Global Step: 51060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:12:44,899-Speed 3448.66 samples/sec Loss 7.4854 LearningRate 0.0711 Epoch: 10 Global Step: 51070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:12:47,907-Speed 3405.05 samples/sec Loss 7.5983 LearningRate 0.0711 Epoch: 10 Global Step: 51080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:12:50,877-Speed 3448.90 samples/sec Loss 7.6735 LearningRate 0.0710 Epoch: 10 Global Step: 51090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:12:53,858-Speed 3436.25 samples/sec Loss 7.6388 LearningRate 0.0710 Epoch: 10 Global Step: 51100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:12:56,823-Speed 3454.74 samples/sec Loss 7.4472 LearningRate 0.0710 Epoch: 10 Global Step: 51110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:12:59,800-Speed 3441.13 samples/sec Loss 7.6069 LearningRate 0.0710 Epoch: 10 Global Step: 51120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:13:02,762-Speed 3456.96 samples/sec Loss 7.6634 LearningRate 0.0710 Epoch: 10 Global Step: 51130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:13:05,732-Speed 3448.58 samples/sec Loss 7.8536 LearningRate 0.0709 Epoch: 10 Global Step: 51140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:13:08,727-Speed 3420.61 samples/sec Loss 7.5661 LearningRate 0.0709 Epoch: 10 Global Step: 51150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:13:11,736-Speed 3404.38 samples/sec Loss 7.6742 LearningRate 0.0709 Epoch: 10 Global Step: 51160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:13:14,756-Speed 3391.15 samples/sec Loss 7.7484 LearningRate 0.0709 Epoch: 10 Global Step: 51170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:13:17,718-Speed 3458.56 samples/sec Loss 7.7687 LearningRate 0.0709 Epoch: 10 Global Step: 51180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:13:20,675-Speed 3463.61 samples/sec Loss 7.8326 LearningRate 0.0708 Epoch: 10 Global Step: 51190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:13:23,641-Speed 3453.11 samples/sec Loss 7.8201 LearningRate 0.0708 Epoch: 10 Global Step: 51200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:13:26,577-Speed 3489.46 samples/sec Loss 7.6559 LearningRate 0.0708 Epoch: 10 Global Step: 51210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:13:29,528-Speed 3471.10 samples/sec Loss 7.7365 LearningRate 0.0708 Epoch: 10 Global Step: 51220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:13:32,483-Speed 3465.20 samples/sec Loss 7.7908 LearningRate 0.0708 Epoch: 10 Global Step: 51230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:13:35,453-Speed 3449.13 samples/sec Loss 7.9679 LearningRate 0.0708 Epoch: 10 Global Step: 51240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:13:38,468-Speed 3397.07 samples/sec Loss 7.9095 LearningRate 0.0707 Epoch: 10 Global Step: 51250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:13:41,503-Speed 3374.81 samples/sec Loss 7.8521 LearningRate 0.0707 Epoch: 10 Global Step: 51260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:13:44,484-Speed 3435.96 samples/sec Loss 7.7507 LearningRate 0.0707 Epoch: 10 Global Step: 51270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:13:47,448-Speed 3455.83 samples/sec Loss 7.7362 LearningRate 0.0707 Epoch: 10 Global Step: 51280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:13:50,404-Speed 3464.80 samples/sec Loss 7.6946 LearningRate 0.0707 Epoch: 10 Global Step: 51290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:13:53,384-Speed 3437.82 samples/sec Loss 7.6534 LearningRate 0.0706 Epoch: 10 Global Step: 51300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:13:56,351-Speed 3452.40 samples/sec Loss 7.5280 LearningRate 0.0706 Epoch: 10 Global Step: 51310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:13:59,334-Speed 3432.97 samples/sec Loss 7.6896 LearningRate 0.0706 Epoch: 10 Global Step: 51320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:14:02,326-Speed 3423.28 samples/sec Loss 7.7010 LearningRate 0.0706 Epoch: 10 Global Step: 51330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:14:05,313-Speed 3430.10 samples/sec Loss 7.8107 LearningRate 0.0706 Epoch: 10 Global Step: 51340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:14:08,281-Speed 3450.00 samples/sec Loss 7.8465 LearningRate 0.0705 Epoch: 10 Global Step: 51350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:14:11,233-Speed 3470.68 samples/sec Loss 7.6849 LearningRate 0.0705 Epoch: 10 Global Step: 51360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:14:14,197-Speed 3455.99 samples/sec Loss 7.8222 LearningRate 0.0705 Epoch: 10 Global Step: 51370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:14:17,142-Speed 3477.44 samples/sec Loss 7.7208 LearningRate 0.0705 Epoch: 10 Global Step: 51380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:14:20,179-Speed 3373.29 samples/sec Loss 7.8381 LearningRate 0.0705 Epoch: 10 Global Step: 51390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:14:23,159-Speed 3436.58 samples/sec Loss 7.8430 LearningRate 0.0705 Epoch: 10 Global Step: 51400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:14:26,137-Speed 3439.82 samples/sec Loss 7.6736 LearningRate 0.0704 Epoch: 10 Global Step: 51410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:14:29,126-Speed 3426.83 samples/sec Loss 7.7214 LearningRate 0.0704 Epoch: 10 Global Step: 51420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:14:32,090-Speed 3455.90 samples/sec Loss 7.8454 LearningRate 0.0704 Epoch: 10 Global Step: 51430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:14:35,075-Speed 3430.87 samples/sec Loss 7.7162 LearningRate 0.0704 Epoch: 10 Global Step: 51440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:14:38,015-Speed 3483.52 samples/sec Loss 7.7901 LearningRate 0.0704 Epoch: 10 Global Step: 51450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:14:40,978-Speed 3457.99 samples/sec Loss 7.8892 LearningRate 0.0703 Epoch: 10 Global Step: 51460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:14:43,967-Speed 3426.99 samples/sec Loss 7.9086 LearningRate 0.0703 Epoch: 10 Global Step: 51470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:14:46,946-Speed 3439.09 samples/sec Loss 7.7672 LearningRate 0.0703 Epoch: 10 Global Step: 51480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:14:49,942-Speed 3420.05 samples/sec Loss 7.7915 LearningRate 0.0703 Epoch: 10 Global Step: 51490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:14:52,940-Speed 3415.99 samples/sec Loss 7.7051 LearningRate 0.0703 Epoch: 10 Global Step: 51500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:14:55,933-Speed 3422.29 samples/sec Loss 7.8388 LearningRate 0.0702 Epoch: 10 Global Step: 51510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:14:58,896-Speed 3456.17 samples/sec Loss 7.9194 LearningRate 0.0702 Epoch: 10 Global Step: 51520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:15:01,889-Speed 3422.33 samples/sec Loss 7.7473 LearningRate 0.0702 Epoch: 10 Global Step: 51530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:15:04,868-Speed 3438.74 samples/sec Loss 7.7166 LearningRate 0.0702 Epoch: 10 Global Step: 51540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:15:07,927-Speed 3348.76 samples/sec Loss 7.8269 LearningRate 0.0702 Epoch: 10 Global Step: 51550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:15:10,894-Speed 3452.91 samples/sec Loss 7.7150 LearningRate 0.0702 Epoch: 10 Global Step: 51560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:15:13,841-Speed 3476.12 samples/sec Loss 7.6799 LearningRate 0.0701 Epoch: 10 Global Step: 51570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:15:16,842-Speed 3412.56 samples/sec Loss 7.8247 LearningRate 0.0701 Epoch: 10 Global Step: 51580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:15:19,815-Speed 3444.88 samples/sec Loss 7.8548 LearningRate 0.0701 Epoch: 10 Global Step: 51590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:15:22,771-Speed 3465.94 samples/sec Loss 7.6813 LearningRate 0.0701 Epoch: 10 Global Step: 51600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:15:25,752-Speed 3435.48 samples/sec Loss 7.7046 LearningRate 0.0701 Epoch: 10 Global Step: 51610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:15:28,709-Speed 3465.28 samples/sec Loss 7.6044 LearningRate 0.0700 Epoch: 10 Global Step: 51620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:15:31,702-Speed 3421.60 samples/sec Loss 7.7641 LearningRate 0.0700 Epoch: 10 Global Step: 51630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:15:34,676-Speed 3444.65 samples/sec Loss 7.8075 LearningRate 0.0700 Epoch: 10 Global Step: 51640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:15:37,666-Speed 3425.30 samples/sec Loss 7.6976 LearningRate 0.0700 Epoch: 10 Global Step: 51650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:15:40,643-Speed 3440.22 samples/sec Loss 7.5818 LearningRate 0.0700 Epoch: 10 Global Step: 51660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:15:43,650-Speed 3406.92 samples/sec Loss 7.6387 LearningRate 0.0699 Epoch: 10 Global Step: 51670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:15:46,635-Speed 3431.26 samples/sec Loss 7.8904 LearningRate 0.0699 Epoch: 10 Global Step: 51680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:15:49,730-Speed 3309.71 samples/sec Loss 7.8152 LearningRate 0.0699 Epoch: 10 Global Step: 51690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:15:52,821-Speed 3312.78 samples/sec Loss 7.8797 LearningRate 0.0699 Epoch: 10 Global Step: 51700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:15:55,883-Speed 3345.11 samples/sec Loss 7.7659 LearningRate 0.0699 Epoch: 10 Global Step: 51710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:15:58,975-Speed 3313.33 samples/sec Loss 7.6991 LearningRate 0.0699 Epoch: 10 Global Step: 51720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:16:02,029-Speed 3353.30 samples/sec Loss 7.7855 LearningRate 0.0698 Epoch: 10 Global Step: 51730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:16:05,130-Speed 3304.16 samples/sec Loss 7.8341 LearningRate 0.0698 Epoch: 10 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:16:08,108-Speed 3438.54 samples/sec Loss 7.8211 LearningRate 0.0698 Epoch: 10 Global Step: 51750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:16:11,097-Speed 3426.64 samples/sec Loss 7.8218 LearningRate 0.0698 Epoch: 10 Global Step: 51760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:16:14,135-Speed 3371.64 samples/sec Loss 7.8671 LearningRate 0.0698 Epoch: 10 Global Step: 51770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:16:17,104-Speed 3449.80 samples/sec Loss 7.7716 LearningRate 0.0697 Epoch: 10 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:16:20,115-Speed 3402.24 samples/sec Loss 7.9923 LearningRate 0.0697 Epoch: 10 Global Step: 51790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:16:23,094-Speed 3437.66 samples/sec Loss 7.7753 LearningRate 0.0697 Epoch: 10 Global Step: 51800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:16:26,167-Speed 3333.63 samples/sec Loss 7.8285 LearningRate 0.0697 Epoch: 10 Global Step: 51810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:16:29,219-Speed 3355.92 samples/sec Loss 7.7900 LearningRate 0.0697 Epoch: 10 Global Step: 51820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:16:32,203-Speed 3432.86 samples/sec Loss 7.7953 LearningRate 0.0696 Epoch: 10 Global Step: 51830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:16:35,228-Speed 3386.24 samples/sec Loss 7.6931 LearningRate 0.0696 Epoch: 10 Global Step: 51840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:16:38,253-Speed 3386.18 samples/sec Loss 7.7278 LearningRate 0.0696 Epoch: 10 Global Step: 51850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:16:41,297-Speed 3364.93 samples/sec Loss 7.8971 LearningRate 0.0696 Epoch: 10 Global Step: 51860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:16:44,316-Speed 3392.37 samples/sec Loss 7.6865 LearningRate 0.0696 Epoch: 10 Global Step: 51870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:16:47,289-Speed 3444.99 samples/sec Loss 7.8553 LearningRate 0.0696 Epoch: 10 Global Step: 51880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:16:50,240-Speed 3471.16 samples/sec Loss 7.7631 LearningRate 0.0695 Epoch: 10 Global Step: 51890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:16:53,198-Speed 3462.69 samples/sec Loss 7.7889 LearningRate 0.0695 Epoch: 10 Global Step: 51900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:16:56,163-Speed 3454.40 samples/sec Loss 7.7466 LearningRate 0.0695 Epoch: 10 Global Step: 51910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 22:16:59,115-Speed 3470.10 samples/sec Loss 7.7670 LearningRate 0.0695 Epoch: 10 Global Step: 51920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 22:17:02,073-Speed 3462.52 samples/sec Loss 7.7336 LearningRate 0.0695 Epoch: 10 Global Step: 51930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:17:05,100-Speed 3383.85 samples/sec Loss 7.7737 LearningRate 0.0694 Epoch: 10 Global Step: 51940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:17:08,098-Speed 3416.71 samples/sec Loss 7.8772 LearningRate 0.0694 Epoch: 10 Global Step: 51950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:17:11,142-Speed 3364.47 samples/sec Loss 7.8426 LearningRate 0.0694 Epoch: 10 Global Step: 51960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:17:14,123-Speed 3437.38 samples/sec Loss 7.9020 LearningRate 0.0694 Epoch: 10 Global Step: 51970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:17:17,208-Speed 3319.66 samples/sec Loss 7.8825 LearningRate 0.0694 Epoch: 10 Global Step: 51980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:17:20,164-Speed 3464.72 samples/sec Loss 7.6608 LearningRate 0.0693 Epoch: 10 Global Step: 51990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-19 22:17:23,186-Speed 3389.33 samples/sec Loss 7.7515 LearningRate 0.0693 Epoch: 10 Global Step: 52000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-19 22:18:06,230-[lfw][52000]XNorm: 21.798909 Training: 2022-01-19 22:18:06,230-[lfw][52000]Accuracy-Flip: 0.99667+-0.00387 Training: 2022-01-19 22:18:06,231-[lfw][52000]Accuracy-Highest: 0.99767 Training: 2022-01-19 22:18:56,015-[cfp_fp][52000]XNorm: 19.407096 Training: 2022-01-19 22:18:56,016-[cfp_fp][52000]Accuracy-Flip: 0.96386+-0.01096 Training: 2022-01-19 22:18:56,016-[cfp_fp][52000]Accuracy-Highest: 0.96386 Training: 2022-01-19 22:19:38,828-[agedb_30][52000]XNorm: 21.573051 Training: 2022-01-19 22:19:38,829-[agedb_30][52000]Accuracy-Flip: 0.97667+-0.00738 Training: 2022-01-19 22:19:38,829-[agedb_30][52000]Accuracy-Highest: 0.97667 Training: 2022-01-19 22:19:41,860-Speed 73.84 samples/sec Loss 7.7240 LearningRate 0.0693 Epoch: 10 Global Step: 52010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-19 22:19:44,907-Speed 3361.54 samples/sec Loss 7.8344 LearningRate 0.0693 Epoch: 10 Global Step: 52020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-19 22:19:47,867-Speed 3460.10 samples/sec Loss 7.7089 LearningRate 0.0693 Epoch: 10 Global Step: 52030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-19 22:19:50,848-Speed 3436.13 samples/sec Loss 7.7517 LearningRate 0.0693 Epoch: 10 Global Step: 52040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-19 22:19:53,833-Speed 3431.83 samples/sec Loss 7.7319 LearningRate 0.0692 Epoch: 10 Global Step: 52050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-19 22:19:56,808-Speed 3442.93 samples/sec Loss 7.9296 LearningRate 0.0692 Epoch: 10 Global Step: 52060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-19 22:19:59,766-Speed 3462.48 samples/sec Loss 7.7399 LearningRate 0.0692 Epoch: 10 Global Step: 52070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-19 22:20:02,729-Speed 3456.71 samples/sec Loss 7.7974 LearningRate 0.0692 Epoch: 10 Global Step: 52080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-19 22:20:05,701-Speed 3448.43 samples/sec Loss 7.7043 LearningRate 0.0692 Epoch: 10 Global Step: 52090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:20:08,658-Speed 3463.03 samples/sec Loss 7.7689 LearningRate 0.0691 Epoch: 10 Global Step: 52100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:20:11,618-Speed 3461.76 samples/sec Loss 7.7777 LearningRate 0.0691 Epoch: 10 Global Step: 52110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:20:14,602-Speed 3432.83 samples/sec Loss 7.7660 LearningRate 0.0691 Epoch: 10 Global Step: 52120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:20:17,562-Speed 3459.88 samples/sec Loss 7.7786 LearningRate 0.0691 Epoch: 10 Global Step: 52130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:20:20,529-Speed 3451.79 samples/sec Loss 7.8323 LearningRate 0.0691 Epoch: 10 Global Step: 52140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:20:23,512-Speed 3433.65 samples/sec Loss 7.7892 LearningRate 0.0691 Epoch: 10 Global Step: 52150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:20:26,491-Speed 3438.84 samples/sec Loss 7.6695 LearningRate 0.0690 Epoch: 10 Global Step: 52160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:20:29,462-Speed 3447.44 samples/sec Loss 7.7052 LearningRate 0.0690 Epoch: 10 Global Step: 52170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:20:32,427-Speed 3455.00 samples/sec Loss 7.7398 LearningRate 0.0690 Epoch: 10 Global Step: 52180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-19 22:20:35,484-Speed 3350.79 samples/sec Loss 7.8846 LearningRate 0.0690 Epoch: 10 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:20:38,610-Speed 3276.92 samples/sec Loss 7.8776 LearningRate 0.0690 Epoch: 10 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:20:41,682-Speed 3334.04 samples/sec Loss 7.6448 LearningRate 0.0689 Epoch: 10 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:20:44,674-Speed 3423.59 samples/sec Loss 7.7252 LearningRate 0.0689 Epoch: 10 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:20:47,693-Speed 3391.80 samples/sec Loss 7.7851 LearningRate 0.0689 Epoch: 10 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:20:50,669-Speed 3443.27 samples/sec Loss 7.7580 LearningRate 0.0689 Epoch: 10 Global Step: 52240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:20:53,654-Speed 3431.76 samples/sec Loss 7.6292 LearningRate 0.0689 Epoch: 10 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:20:56,635-Speed 3437.28 samples/sec Loss 7.6422 LearningRate 0.0688 Epoch: 10 Global Step: 52260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:20:59,613-Speed 3438.81 samples/sec Loss 7.8204 LearningRate 0.0688 Epoch: 10 Global Step: 52270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:21:02,564-Speed 3471.43 samples/sec Loss 7.6163 LearningRate 0.0688 Epoch: 10 Global Step: 52280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:21:05,547-Speed 3434.08 samples/sec Loss 7.8232 LearningRate 0.0688 Epoch: 10 Global Step: 52290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:08,519-Speed 3446.28 samples/sec Loss 7.7255 LearningRate 0.0688 Epoch: 10 Global Step: 52300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:11,490-Speed 3447.77 samples/sec Loss 7.6720 LearningRate 0.0688 Epoch: 10 Global Step: 52310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:14,467-Speed 3440.15 samples/sec Loss 7.7675 LearningRate 0.0687 Epoch: 10 Global Step: 52320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:17,441-Speed 3444.24 samples/sec Loss 7.8204 LearningRate 0.0687 Epoch: 10 Global Step: 52330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:20,403-Speed 3459.10 samples/sec Loss 7.6822 LearningRate 0.0687 Epoch: 10 Global Step: 52340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:23,374-Speed 3447.59 samples/sec Loss 7.7344 LearningRate 0.0687 Epoch: 10 Global Step: 52350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:26,382-Speed 3405.96 samples/sec Loss 7.8734 LearningRate 0.0687 Epoch: 10 Global Step: 52360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:29,379-Speed 3417.89 samples/sec Loss 7.7497 LearningRate 0.0686 Epoch: 10 Global Step: 52370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:32,334-Speed 3465.10 samples/sec Loss 7.6605 LearningRate 0.0686 Epoch: 10 Global Step: 52380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:35,334-Speed 3415.42 samples/sec Loss 7.5439 LearningRate 0.0686 Epoch: 10 Global Step: 52390 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-19 22:21:38,289-Speed 3464.94 samples/sec Loss 7.7412 LearningRate 0.0686 Epoch: 10 Global Step: 52400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:41,268-Speed 3438.50 samples/sec Loss 7.7699 LearningRate 0.0686 Epoch: 10 Global Step: 52410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-19 22:21:44,226-Speed 3463.09 samples/sec Loss 7.8356 LearningRate 0.0686 Epoch: 10 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:21:47,232-Speed 3407.07 samples/sec Loss 7.7506 LearningRate 0.0685 Epoch: 10 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:21:50,266-Speed 3376.34 samples/sec Loss 7.7222 LearningRate 0.0685 Epoch: 10 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:21:53,284-Speed 3394.16 samples/sec Loss 7.8726 LearningRate 0.0685 Epoch: 10 Global Step: 52450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:21:56,323-Speed 3371.71 samples/sec Loss 7.7522 LearningRate 0.0685 Epoch: 10 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:21:59,309-Speed 3430.35 samples/sec Loss 7.6605 LearningRate 0.0685 Epoch: 10 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:22:02,283-Speed 3443.19 samples/sec Loss 7.8042 LearningRate 0.0684 Epoch: 10 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:22:05,286-Speed 3411.55 samples/sec Loss 7.6803 LearningRate 0.0684 Epoch: 10 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:22:08,306-Speed 3391.37 samples/sec Loss 7.6437 LearningRate 0.0684 Epoch: 10 Global Step: 52500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:22:11,453-Speed 3254.58 samples/sec Loss 7.6388 LearningRate 0.0684 Epoch: 10 Global Step: 52510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-19 22:22:14,434-Speed 3435.93 samples/sec Loss 7.8317 LearningRate 0.0684 Epoch: 10 Global Step: 52520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:17,408-Speed 3444.23 samples/sec Loss 7.8015 LearningRate 0.0683 Epoch: 10 Global Step: 52530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:20,394-Speed 3430.23 samples/sec Loss 7.8557 LearningRate 0.0683 Epoch: 10 Global Step: 52540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:23,372-Speed 3440.07 samples/sec Loss 7.8225 LearningRate 0.0683 Epoch: 10 Global Step: 52550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:26,385-Speed 3399.28 samples/sec Loss 7.6448 LearningRate 0.0683 Epoch: 10 Global Step: 52560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:29,368-Speed 3434.36 samples/sec Loss 7.7924 LearningRate 0.0683 Epoch: 10 Global Step: 52570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:32,379-Speed 3401.27 samples/sec Loss 7.7071 LearningRate 0.0683 Epoch: 10 Global Step: 52580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:35,337-Speed 3462.76 samples/sec Loss 7.6497 LearningRate 0.0682 Epoch: 10 Global Step: 52590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:38,322-Speed 3431.71 samples/sec Loss 7.6617 LearningRate 0.0682 Epoch: 10 Global Step: 52600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:41,280-Speed 3462.51 samples/sec Loss 7.7424 LearningRate 0.0682 Epoch: 10 Global Step: 52610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:44,223-Speed 3480.60 samples/sec Loss 7.6343 LearningRate 0.0682 Epoch: 10 Global Step: 52620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:47,197-Speed 3443.28 samples/sec Loss 7.6693 LearningRate 0.0682 Epoch: 10 Global Step: 52630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:50,190-Speed 3423.38 samples/sec Loss 7.8274 LearningRate 0.0681 Epoch: 10 Global Step: 52640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:53,157-Speed 3451.31 samples/sec Loss 7.7396 LearningRate 0.0681 Epoch: 10 Global Step: 52650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:22:56,083-Speed 3502.15 samples/sec Loss 7.6960 LearningRate 0.0681 Epoch: 10 Global Step: 52660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:22:59,051-Speed 3450.03 samples/sec Loss 7.6258 LearningRate 0.0681 Epoch: 10 Global Step: 52670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:23:02,090-Speed 3370.57 samples/sec Loss 7.9738 LearningRate 0.0681 Epoch: 10 Global Step: 52680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:23:05,118-Speed 3382.78 samples/sec Loss 7.5983 LearningRate 0.0681 Epoch: 10 Global Step: 52690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:23:08,105-Speed 3428.37 samples/sec Loss 7.8042 LearningRate 0.0680 Epoch: 10 Global Step: 52700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:23:11,089-Speed 3433.30 samples/sec Loss 7.7328 LearningRate 0.0680 Epoch: 10 Global Step: 52710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:23:14,219-Speed 3271.87 samples/sec Loss 7.6367 LearningRate 0.0680 Epoch: 10 Global Step: 52720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:23:17,374-Speed 3246.83 samples/sec Loss 7.7200 LearningRate 0.0680 Epoch: 10 Global Step: 52730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:23:20,340-Speed 3453.26 samples/sec Loss 7.6043 LearningRate 0.0680 Epoch: 10 Global Step: 52740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:23:23,332-Speed 3423.09 samples/sec Loss 7.6077 LearningRate 0.0679 Epoch: 10 Global Step: 52750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:23:26,409-Speed 3329.72 samples/sec Loss 7.7749 LearningRate 0.0679 Epoch: 10 Global Step: 52760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:23:29,392-Speed 3433.56 samples/sec Loss 7.6443 LearningRate 0.0679 Epoch: 10 Global Step: 52770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:23:32,358-Speed 3452.97 samples/sec Loss 7.6674 LearningRate 0.0679 Epoch: 10 Global Step: 52780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:23:35,355-Speed 3418.29 samples/sec Loss 7.7806 LearningRate 0.0679 Epoch: 10 Global Step: 52790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:23:38,319-Speed 3455.15 samples/sec Loss 7.7705 LearningRate 0.0678 Epoch: 10 Global Step: 52800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:23:41,381-Speed 3345.52 samples/sec Loss 7.6318 LearningRate 0.0678 Epoch: 10 Global Step: 52810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:23:44,420-Speed 3370.53 samples/sec Loss 7.7972 LearningRate 0.0678 Epoch: 10 Global Step: 52820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:23:47,449-Speed 3381.67 samples/sec Loss 7.7400 LearningRate 0.0678 Epoch: 10 Global Step: 52830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:23:50,432-Speed 3434.82 samples/sec Loss 7.8516 LearningRate 0.0678 Epoch: 10 Global Step: 52840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:23:53,421-Speed 3426.16 samples/sec Loss 7.7799 LearningRate 0.0678 Epoch: 10 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:23:56,392-Speed 3448.14 samples/sec Loss 7.7554 LearningRate 0.0677 Epoch: 10 Global Step: 52860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:23:59,503-Speed 3291.54 samples/sec Loss 7.8021 LearningRate 0.0677 Epoch: 10 Global Step: 52870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:24:02,507-Speed 3409.54 samples/sec Loss 7.7403 LearningRate 0.0677 Epoch: 10 Global Step: 52880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:24:05,532-Speed 3386.51 samples/sec Loss 7.8284 LearningRate 0.0677 Epoch: 10 Global Step: 52890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:24:08,523-Speed 3424.85 samples/sec Loss 7.8716 LearningRate 0.0677 Epoch: 10 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:24:11,487-Speed 3456.48 samples/sec Loss 7.5422 LearningRate 0.0676 Epoch: 10 Global Step: 52910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:24:14,485-Speed 3416.14 samples/sec Loss 7.7979 LearningRate 0.0676 Epoch: 10 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:24:17,461-Speed 3441.20 samples/sec Loss 7.7162 LearningRate 0.0676 Epoch: 10 Global Step: 52930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:24:20,436-Speed 3443.00 samples/sec Loss 7.6723 LearningRate 0.0676 Epoch: 10 Global Step: 52940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:24:23,515-Speed 3326.38 samples/sec Loss 7.8505 LearningRate 0.0676 Epoch: 10 Global Step: 52950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:24:26,490-Speed 3443.91 samples/sec Loss 7.8528 LearningRate 0.0676 Epoch: 10 Global Step: 52960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:24:29,447-Speed 3463.90 samples/sec Loss 7.5655 LearningRate 0.0675 Epoch: 10 Global Step: 52970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:24:32,409-Speed 3457.22 samples/sec Loss 7.8663 LearningRate 0.0675 Epoch: 10 Global Step: 52980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:24:35,374-Speed 3455.80 samples/sec Loss 7.8492 LearningRate 0.0675 Epoch: 10 Global Step: 52990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:24:38,357-Speed 3432.63 samples/sec Loss 7.6093 LearningRate 0.0675 Epoch: 10 Global Step: 53000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:24:41,319-Speed 3459.55 samples/sec Loss 7.8221 LearningRate 0.0675 Epoch: 10 Global Step: 53010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:24:44,449-Speed 3273.09 samples/sec Loss 7.7231 LearningRate 0.0674 Epoch: 10 Global Step: 53020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:24:47,502-Speed 3353.75 samples/sec Loss 7.8529 LearningRate 0.0674 Epoch: 10 Global Step: 53030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:24:50,483-Speed 3436.67 samples/sec Loss 7.6691 LearningRate 0.0674 Epoch: 10 Global Step: 53040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:24:53,465-Speed 3434.72 samples/sec Loss 7.6348 LearningRate 0.0674 Epoch: 10 Global Step: 53050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:24:56,463-Speed 3416.46 samples/sec Loss 7.6466 LearningRate 0.0674 Epoch: 10 Global Step: 53060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:24:59,461-Speed 3417.20 samples/sec Loss 7.7173 LearningRate 0.0674 Epoch: 10 Global Step: 53070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:02,445-Speed 3431.64 samples/sec Loss 7.7532 LearningRate 0.0673 Epoch: 10 Global Step: 53080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:05,424-Speed 3439.42 samples/sec Loss 7.6205 LearningRate 0.0673 Epoch: 10 Global Step: 53090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:08,376-Speed 3469.82 samples/sec Loss 7.5928 LearningRate 0.0673 Epoch: 10 Global Step: 53100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:11,469-Speed 3311.17 samples/sec Loss 7.7310 LearningRate 0.0673 Epoch: 10 Global Step: 53110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:14,489-Speed 3391.31 samples/sec Loss 7.7333 LearningRate 0.0673 Epoch: 10 Global Step: 53120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:17,458-Speed 3449.83 samples/sec Loss 7.7201 LearningRate 0.0672 Epoch: 10 Global Step: 53130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:20,421-Speed 3457.90 samples/sec Loss 7.6496 LearningRate 0.0672 Epoch: 10 Global Step: 53140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:23,399-Speed 3438.91 samples/sec Loss 7.7930 LearningRate 0.0672 Epoch: 10 Global Step: 53150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:26,484-Speed 3320.74 samples/sec Loss 7.6601 LearningRate 0.0672 Epoch: 10 Global Step: 53160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:29,492-Speed 3404.24 samples/sec Loss 7.8622 LearningRate 0.0672 Epoch: 10 Global Step: 53170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:32,475-Speed 3433.43 samples/sec Loss 7.6009 LearningRate 0.0671 Epoch: 10 Global Step: 53180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:35,460-Speed 3432.19 samples/sec Loss 7.6458 LearningRate 0.0671 Epoch: 10 Global Step: 53190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:38,432-Speed 3446.67 samples/sec Loss 7.7325 LearningRate 0.0671 Epoch: 10 Global Step: 53200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:41,410-Speed 3439.41 samples/sec Loss 7.6833 LearningRate 0.0671 Epoch: 10 Global Step: 53210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:25:44,441-Speed 3378.61 samples/sec Loss 7.7108 LearningRate 0.0671 Epoch: 10 Global Step: 53220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:25:47,423-Speed 3435.74 samples/sec Loss 7.7752 LearningRate 0.0671 Epoch: 10 Global Step: 53230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:25:50,386-Speed 3456.87 samples/sec Loss 7.6248 LearningRate 0.0670 Epoch: 10 Global Step: 53240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:25:53,503-Speed 3285.21 samples/sec Loss 7.7950 LearningRate 0.0670 Epoch: 10 Global Step: 53250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:25:56,487-Speed 3433.59 samples/sec Loss 7.7534 LearningRate 0.0670 Epoch: 10 Global Step: 53260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:25:59,453-Speed 3452.85 samples/sec Loss 7.7180 LearningRate 0.0670 Epoch: 10 Global Step: 53270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:26:02,450-Speed 3417.51 samples/sec Loss 7.5967 LearningRate 0.0670 Epoch: 10 Global Step: 53280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:26:05,491-Speed 3368.36 samples/sec Loss 7.6835 LearningRate 0.0669 Epoch: 10 Global Step: 53290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:26:08,508-Speed 3395.39 samples/sec Loss 7.7243 LearningRate 0.0669 Epoch: 10 Global Step: 53300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:26:11,505-Speed 3417.06 samples/sec Loss 7.8483 LearningRate 0.0669 Epoch: 10 Global Step: 53310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:26:14,559-Speed 3353.72 samples/sec Loss 7.5723 LearningRate 0.0669 Epoch: 10 Global Step: 53320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:26:17,643-Speed 3321.05 samples/sec Loss 7.6836 LearningRate 0.0669 Epoch: 10 Global Step: 53330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:26:20,617-Speed 3445.17 samples/sec Loss 7.6121 LearningRate 0.0669 Epoch: 10 Global Step: 53340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:26:23,610-Speed 3421.48 samples/sec Loss 7.7801 LearningRate 0.0668 Epoch: 10 Global Step: 53350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:26:26,594-Speed 3433.59 samples/sec Loss 7.7070 LearningRate 0.0668 Epoch: 10 Global Step: 53360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:26:29,571-Speed 3440.23 samples/sec Loss 7.6178 LearningRate 0.0668 Epoch: 10 Global Step: 53370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:26:32,553-Speed 3434.88 samples/sec Loss 7.5977 LearningRate 0.0668 Epoch: 10 Global Step: 53380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:26:35,519-Speed 3453.32 samples/sec Loss 7.5951 LearningRate 0.0668 Epoch: 10 Global Step: 53390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:26:38,555-Speed 3373.62 samples/sec Loss 7.6069 LearningRate 0.0667 Epoch: 10 Global Step: 53400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:26:41,563-Speed 3407.27 samples/sec Loss 7.4791 LearningRate 0.0667 Epoch: 10 Global Step: 53410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:26:44,561-Speed 3415.91 samples/sec Loss 7.8404 LearningRate 0.0667 Epoch: 10 Global Step: 53420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:26:47,660-Speed 3304.96 samples/sec Loss 7.8143 LearningRate 0.0667 Epoch: 10 Global Step: 53430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:26:50,686-Speed 3385.64 samples/sec Loss 7.7153 LearningRate 0.0667 Epoch: 10 Global Step: 53440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:26:53,660-Speed 3443.23 samples/sec Loss 7.7992 LearningRate 0.0667 Epoch: 10 Global Step: 53450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:26:56,638-Speed 3441.04 samples/sec Loss 7.5483 LearningRate 0.0666 Epoch: 10 Global Step: 53460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:26:59,609-Speed 3447.41 samples/sec Loss 7.6671 LearningRate 0.0666 Epoch: 10 Global Step: 53470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:02,584-Speed 3442.27 samples/sec Loss 7.7242 LearningRate 0.0666 Epoch: 10 Global Step: 53480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:05,563-Speed 3438.85 samples/sec Loss 7.8690 LearningRate 0.0666 Epoch: 10 Global Step: 53490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:08,581-Speed 3393.19 samples/sec Loss 7.7593 LearningRate 0.0666 Epoch: 10 Global Step: 53500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:11,566-Speed 3432.01 samples/sec Loss 7.7608 LearningRate 0.0665 Epoch: 10 Global Step: 53510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:14,541-Speed 3443.22 samples/sec Loss 7.6809 LearningRate 0.0665 Epoch: 10 Global Step: 53520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:27:17,550-Speed 3404.22 samples/sec Loss 7.9496 LearningRate 0.0665 Epoch: 10 Global Step: 53530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:27:20,555-Speed 3409.52 samples/sec Loss 7.6350 LearningRate 0.0665 Epoch: 10 Global Step: 53540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:27:23,563-Speed 3404.26 samples/sec Loss 7.8578 LearningRate 0.0665 Epoch: 10 Global Step: 53550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:27:26,548-Speed 3433.66 samples/sec Loss 7.6690 LearningRate 0.0665 Epoch: 10 Global Step: 53560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:27:29,586-Speed 3370.84 samples/sec Loss 7.7635 LearningRate 0.0664 Epoch: 10 Global Step: 53570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:27:32,623-Speed 3372.24 samples/sec Loss 7.6791 LearningRate 0.0664 Epoch: 10 Global Step: 53580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:27:35,716-Speed 3311.90 samples/sec Loss 7.7484 LearningRate 0.0664 Epoch: 10 Global Step: 53590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:27:38,680-Speed 3455.93 samples/sec Loss 7.5754 LearningRate 0.0664 Epoch: 10 Global Step: 53600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:41,656-Speed 3441.39 samples/sec Loss 7.6159 LearningRate 0.0664 Epoch: 10 Global Step: 53610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:44,645-Speed 3427.21 samples/sec Loss 7.7105 LearningRate 0.0663 Epoch: 10 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:47,659-Speed 3397.87 samples/sec Loss 7.6558 LearningRate 0.0663 Epoch: 10 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:50,643-Speed 3433.44 samples/sec Loss 7.7184 LearningRate 0.0663 Epoch: 10 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:53,626-Speed 3433.73 samples/sec Loss 7.8071 LearningRate 0.0663 Epoch: 10 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:56,645-Speed 3393.11 samples/sec Loss 7.7100 LearningRate 0.0663 Epoch: 10 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:27:59,606-Speed 3459.88 samples/sec Loss 7.6842 LearningRate 0.0663 Epoch: 10 Global Step: 53670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:28:02,586-Speed 3436.69 samples/sec Loss 7.7880 LearningRate 0.0662 Epoch: 10 Global Step: 53680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:28:05,620-Speed 3375.41 samples/sec Loss 7.6112 LearningRate 0.0662 Epoch: 10 Global Step: 53690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:28:08,631-Speed 3401.52 samples/sec Loss 7.5974 LearningRate 0.0662 Epoch: 10 Global Step: 53700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:28:11,670-Speed 3371.73 samples/sec Loss 7.7479 LearningRate 0.0662 Epoch: 10 Global Step: 53710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:28:14,635-Speed 3453.36 samples/sec Loss 7.5857 LearningRate 0.0662 Epoch: 10 Global Step: 53720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:28:17,622-Speed 3429.98 samples/sec Loss 7.7029 LearningRate 0.0661 Epoch: 10 Global Step: 53730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:28:20,671-Speed 3359.38 samples/sec Loss 7.5812 LearningRate 0.0661 Epoch: 10 Global Step: 53740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:28:23,651-Speed 3437.54 samples/sec Loss 7.6269 LearningRate 0.0661 Epoch: 10 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:28:26,664-Speed 3398.81 samples/sec Loss 7.6325 LearningRate 0.0661 Epoch: 10 Global Step: 53760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:28:29,700-Speed 3374.46 samples/sec Loss 7.6291 LearningRate 0.0661 Epoch: 10 Global Step: 53770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:28:32,666-Speed 3452.46 samples/sec Loss 7.7312 LearningRate 0.0661 Epoch: 10 Global Step: 53780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:28:35,678-Speed 3400.28 samples/sec Loss 7.6778 LearningRate 0.0660 Epoch: 10 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:28:38,799-Speed 3281.99 samples/sec Loss 7.6710 LearningRate 0.0660 Epoch: 10 Global Step: 53800 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-19 22:28:41,729-Speed 3496.51 samples/sec Loss 7.5907 LearningRate 0.0660 Epoch: 10 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:28:44,722-Speed 3421.67 samples/sec Loss 7.5440 LearningRate 0.0660 Epoch: 10 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:28:47,710-Speed 3428.36 samples/sec Loss 7.7688 LearningRate 0.0660 Epoch: 10 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:28:50,707-Speed 3418.17 samples/sec Loss 7.7443 LearningRate 0.0659 Epoch: 10 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:28:53,681-Speed 3443.67 samples/sec Loss 7.6585 LearningRate 0.0659 Epoch: 10 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:28:56,709-Speed 3382.26 samples/sec Loss 7.6939 LearningRate 0.0659 Epoch: 10 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:28:59,697-Speed 3428.00 samples/sec Loss 7.7364 LearningRate 0.0659 Epoch: 10 Global Step: 53870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:29:02,706-Speed 3404.49 samples/sec Loss 7.5933 LearningRate 0.0659 Epoch: 10 Global Step: 53880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:29:05,700-Speed 3420.54 samples/sec Loss 7.7337 LearningRate 0.0659 Epoch: 10 Global Step: 53890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:29:08,706-Speed 3407.09 samples/sec Loss 7.9123 LearningRate 0.0658 Epoch: 10 Global Step: 53900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:29:11,742-Speed 3374.82 samples/sec Loss 7.6090 LearningRate 0.0658 Epoch: 10 Global Step: 53910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:29:14,810-Speed 3338.71 samples/sec Loss 7.7642 LearningRate 0.0658 Epoch: 10 Global Step: 53920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:29:17,782-Speed 3446.25 samples/sec Loss 7.6941 LearningRate 0.0658 Epoch: 10 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:29:20,770-Speed 3427.79 samples/sec Loss 7.7441 LearningRate 0.0658 Epoch: 10 Global Step: 53940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:29:23,751-Speed 3435.81 samples/sec Loss 7.5536 LearningRate 0.0657 Epoch: 10 Global Step: 53950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:29:26,764-Speed 3399.29 samples/sec Loss 7.9278 LearningRate 0.0657 Epoch: 10 Global Step: 53960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:29:29,744-Speed 3437.79 samples/sec Loss 7.6626 LearningRate 0.0657 Epoch: 10 Global Step: 53970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:29:32,710-Speed 3453.56 samples/sec Loss 7.4938 LearningRate 0.0657 Epoch: 10 Global Step: 53980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:29:35,700-Speed 3424.70 samples/sec Loss 7.7319 LearningRate 0.0657 Epoch: 10 Global Step: 53990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:29:38,692-Speed 3423.79 samples/sec Loss 7.7089 LearningRate 0.0657 Epoch: 10 Global Step: 54000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:30:21,866-[lfw][54000]XNorm: 22.021960 Training: 2022-01-19 22:30:21,866-[lfw][54000]Accuracy-Flip: 0.99617+-0.00350 Training: 2022-01-19 22:30:21,867-[lfw][54000]Accuracy-Highest: 0.99767 Training: 2022-01-19 22:31:11,927-[cfp_fp][54000]XNorm: 19.378263 Training: 2022-01-19 22:31:11,949-[cfp_fp][54000]Accuracy-Flip: 0.95729+-0.01093 Training: 2022-01-19 22:31:11,950-[cfp_fp][54000]Accuracy-Highest: 0.96386 Training: 2022-01-19 22:31:55,179-[agedb_30][54000]XNorm: 21.680953 Training: 2022-01-19 22:31:55,180-[agedb_30][54000]Accuracy-Flip: 0.97283+-0.00610 Training: 2022-01-19 22:31:55,180-[agedb_30][54000]Accuracy-Highest: 0.97667 Training: 2022-01-19 22:31:58,156-Speed 73.42 samples/sec Loss 7.7757 LearningRate 0.0656 Epoch: 10 Global Step: 54010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:32:01,122-Speed 3452.48 samples/sec Loss 7.6487 LearningRate 0.0656 Epoch: 10 Global Step: 54020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:32:04,110-Speed 3427.62 samples/sec Loss 7.5181 LearningRate 0.0656 Epoch: 10 Global Step: 54030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:32:07,244-Speed 3268.41 samples/sec Loss 7.6602 LearningRate 0.0656 Epoch: 10 Global Step: 54040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:32:10,255-Speed 3402.28 samples/sec Loss 7.5286 LearningRate 0.0656 Epoch: 10 Global Step: 54050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:32:13,290-Speed 3374.43 samples/sec Loss 7.6962 LearningRate 0.0655 Epoch: 10 Global Step: 54060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:32:16,355-Speed 3342.02 samples/sec Loss 7.6857 LearningRate 0.0655 Epoch: 10 Global Step: 54070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:32:19,331-Speed 3441.57 samples/sec Loss 7.5365 LearningRate 0.0655 Epoch: 10 Global Step: 54080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:32:22,339-Speed 3405.94 samples/sec Loss 7.5969 LearningRate 0.0655 Epoch: 10 Global Step: 54090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:32:25,333-Speed 3420.24 samples/sec Loss 7.7262 LearningRate 0.0655 Epoch: 10 Global Step: 54100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:32:28,330-Speed 3418.00 samples/sec Loss 7.5302 LearningRate 0.0655 Epoch: 10 Global Step: 54110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:32:31,417-Speed 3318.31 samples/sec Loss 7.6801 LearningRate 0.0654 Epoch: 10 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:32:34,463-Speed 3362.54 samples/sec Loss 7.5874 LearningRate 0.0654 Epoch: 10 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:32:37,441-Speed 3441.32 samples/sec Loss 7.5545 LearningRate 0.0654 Epoch: 10 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:32:40,434-Speed 3421.39 samples/sec Loss 7.5436 LearningRate 0.0654 Epoch: 10 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:32:43,422-Speed 3428.47 samples/sec Loss 7.4535 LearningRate 0.0654 Epoch: 10 Global Step: 54160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:32:46,402-Speed 3437.64 samples/sec Loss 7.5890 LearningRate 0.0653 Epoch: 10 Global Step: 54170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:32:49,387-Speed 3430.83 samples/sec Loss 7.6551 LearningRate 0.0653 Epoch: 10 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:32:52,349-Speed 3458.78 samples/sec Loss 7.6003 LearningRate 0.0653 Epoch: 10 Global Step: 54190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:32:55,319-Speed 3448.11 samples/sec Loss 7.5685 LearningRate 0.0653 Epoch: 10 Global Step: 54200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:32:58,369-Speed 3358.41 samples/sec Loss 7.5616 LearningRate 0.0653 Epoch: 10 Global Step: 54210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:33:01,371-Speed 3411.84 samples/sec Loss 7.7266 LearningRate 0.0653 Epoch: 10 Global Step: 54220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:33:04,347-Speed 3442.18 samples/sec Loss 7.7936 LearningRate 0.0652 Epoch: 10 Global Step: 54230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:33:07,336-Speed 3427.02 samples/sec Loss 7.5265 LearningRate 0.0652 Epoch: 10 Global Step: 54240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:33:10,324-Speed 3427.86 samples/sec Loss 7.5428 LearningRate 0.0652 Epoch: 10 Global Step: 54250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:33:13,358-Speed 3375.28 samples/sec Loss 7.6287 LearningRate 0.0652 Epoch: 10 Global Step: 54260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:33:16,514-Speed 3246.07 samples/sec Loss 7.5564 LearningRate 0.0652 Epoch: 10 Global Step: 54270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:33:19,538-Speed 3387.38 samples/sec Loss 7.6065 LearningRate 0.0651 Epoch: 10 Global Step: 54280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:33:22,513-Speed 3442.56 samples/sec Loss 7.5829 LearningRate 0.0651 Epoch: 10 Global Step: 54290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:33:25,526-Speed 3399.87 samples/sec Loss 7.6114 LearningRate 0.0651 Epoch: 10 Global Step: 54300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:33:28,651-Speed 3276.68 samples/sec Loss 7.5601 LearningRate 0.0651 Epoch: 10 Global Step: 54310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:33:31,632-Speed 3436.57 samples/sec Loss 7.6838 LearningRate 0.0651 Epoch: 10 Global Step: 54320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:33:34,619-Speed 3428.97 samples/sec Loss 7.8299 LearningRate 0.0651 Epoch: 10 Global Step: 54330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:33:37,607-Speed 3428.08 samples/sec Loss 7.4883 LearningRate 0.0650 Epoch: 10 Global Step: 54340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:33:40,585-Speed 3439.25 samples/sec Loss 7.4472 LearningRate 0.0650 Epoch: 10 Global Step: 54350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:33:43,572-Speed 3429.75 samples/sec Loss 7.7957 LearningRate 0.0650 Epoch: 10 Global Step: 54360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:33:46,551-Speed 3437.89 samples/sec Loss 7.7124 LearningRate 0.0650 Epoch: 10 Global Step: 54370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:33:49,548-Speed 3417.87 samples/sec Loss 7.6027 LearningRate 0.0650 Epoch: 10 Global Step: 54380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:33:52,538-Speed 3427.55 samples/sec Loss 7.5816 LearningRate 0.0650 Epoch: 10 Global Step: 54390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:33:55,533-Speed 3419.91 samples/sec Loss 7.7718 LearningRate 0.0649 Epoch: 10 Global Step: 54400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:33:58,515-Speed 3434.65 samples/sec Loss 7.6727 LearningRate 0.0649 Epoch: 10 Global Step: 54410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:34:01,493-Speed 3439.46 samples/sec Loss 7.6154 LearningRate 0.0649 Epoch: 10 Global Step: 54420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:34:04,468-Speed 3442.42 samples/sec Loss 7.4982 LearningRate 0.0649 Epoch: 10 Global Step: 54430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-19 22:34:07,446-Speed 3439.81 samples/sec Loss 7.6566 LearningRate 0.0649 Epoch: 10 Global Step: 54440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:34:10,479-Speed 3376.97 samples/sec Loss 7.5539 LearningRate 0.0648 Epoch: 10 Global Step: 54450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:34:13,583-Speed 3300.40 samples/sec Loss 7.5695 LearningRate 0.0648 Epoch: 10 Global Step: 54460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:34:16,570-Speed 3428.78 samples/sec Loss 7.5257 LearningRate 0.0648 Epoch: 10 Global Step: 54470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:34:19,556-Speed 3429.94 samples/sec Loss 7.6823 LearningRate 0.0648 Epoch: 10 Global Step: 54480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:34:22,543-Speed 3429.05 samples/sec Loss 7.5077 LearningRate 0.0648 Epoch: 10 Global Step: 54490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:34:25,506-Speed 3456.87 samples/sec Loss 7.5781 LearningRate 0.0648 Epoch: 10 Global Step: 54500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:34:28,753-Speed 3154.52 samples/sec Loss 7.6008 LearningRate 0.0647 Epoch: 10 Global Step: 54510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:34:32,724-Speed 2579.70 samples/sec Loss 7.5996 LearningRate 0.0647 Epoch: 10 Global Step: 54520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:34:35,949-Speed 3175.82 samples/sec Loss 7.4583 LearningRate 0.0647 Epoch: 10 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:34:38,976-Speed 3382.93 samples/sec Loss 7.6084 LearningRate 0.0647 Epoch: 10 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:34:41,965-Speed 3427.03 samples/sec Loss 7.5815 LearningRate 0.0647 Epoch: 10 Global Step: 54550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:34:44,942-Speed 3441.23 samples/sec Loss 7.5110 LearningRate 0.0646 Epoch: 10 Global Step: 54560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:34:47,935-Speed 3421.97 samples/sec Loss 7.5261 LearningRate 0.0646 Epoch: 10 Global Step: 54570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:34:50,945-Speed 3402.52 samples/sec Loss 7.7205 LearningRate 0.0646 Epoch: 10 Global Step: 54580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:34:53,990-Speed 3364.56 samples/sec Loss 7.5791 LearningRate 0.0646 Epoch: 10 Global Step: 54590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:34:56,990-Speed 3414.37 samples/sec Loss 7.5400 LearningRate 0.0646 Epoch: 10 Global Step: 54600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:34:59,965-Speed 3442.63 samples/sec Loss 7.6151 LearningRate 0.0646 Epoch: 10 Global Step: 54610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:35:02,934-Speed 3450.33 samples/sec Loss 7.5154 LearningRate 0.0645 Epoch: 10 Global Step: 54620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:35:05,922-Speed 3427.18 samples/sec Loss 7.6256 LearningRate 0.0645 Epoch: 10 Global Step: 54630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:35:08,904-Speed 3435.11 samples/sec Loss 7.5791 LearningRate 0.0645 Epoch: 10 Global Step: 54640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:35:11,971-Speed 3340.12 samples/sec Loss 7.3405 LearningRate 0.0645 Epoch: 10 Global Step: 54650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:35:15,095-Speed 3278.13 samples/sec Loss 7.5891 LearningRate 0.0645 Epoch: 10 Global Step: 54660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:35:18,084-Speed 3427.30 samples/sec Loss 7.4718 LearningRate 0.0644 Epoch: 10 Global Step: 54670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:35:21,101-Speed 3394.63 samples/sec Loss 7.6952 LearningRate 0.0644 Epoch: 10 Global Step: 54680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:35:24,089-Speed 3428.38 samples/sec Loss 7.5936 LearningRate 0.0644 Epoch: 10 Global Step: 54690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:35:27,091-Speed 3412.63 samples/sec Loss 7.4938 LearningRate 0.0644 Epoch: 10 Global Step: 54700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:35:30,083-Speed 3422.10 samples/sec Loss 7.5303 LearningRate 0.0644 Epoch: 10 Global Step: 54710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:35:33,212-Speed 3273.78 samples/sec Loss 7.7177 LearningRate 0.0644 Epoch: 10 Global Step: 54720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:35:36,226-Speed 3398.82 samples/sec Loss 7.6172 LearningRate 0.0643 Epoch: 10 Global Step: 54730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:35:39,246-Speed 3390.64 samples/sec Loss 7.5147 LearningRate 0.0643 Epoch: 10 Global Step: 54740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:35:42,228-Speed 3435.68 samples/sec Loss 7.7320 LearningRate 0.0643 Epoch: 10 Global Step: 54750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:35:45,256-Speed 3381.84 samples/sec Loss 7.5940 LearningRate 0.0643 Epoch: 10 Global Step: 54760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:35:48,273-Speed 3396.13 samples/sec Loss 7.6263 LearningRate 0.0643 Epoch: 10 Global Step: 54770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:35:51,253-Speed 3436.70 samples/sec Loss 7.6569 LearningRate 0.0642 Epoch: 10 Global Step: 54780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:35:54,243-Speed 3426.11 samples/sec Loss 7.4392 LearningRate 0.0642 Epoch: 10 Global Step: 54790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:35:57,209-Speed 3453.51 samples/sec Loss 7.6602 LearningRate 0.0642 Epoch: 10 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:00,188-Speed 3437.57 samples/sec Loss 7.5638 LearningRate 0.0642 Epoch: 10 Global Step: 54810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:03,172-Speed 3433.31 samples/sec Loss 7.5913 LearningRate 0.0642 Epoch: 10 Global Step: 54820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:06,158-Speed 3429.82 samples/sec Loss 7.4719 LearningRate 0.0642 Epoch: 10 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:09,193-Speed 3375.34 samples/sec Loss 7.5954 LearningRate 0.0641 Epoch: 10 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:12,177-Speed 3431.96 samples/sec Loss 7.6314 LearningRate 0.0641 Epoch: 10 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:15,151-Speed 3443.95 samples/sec Loss 7.6343 LearningRate 0.0641 Epoch: 10 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:18,141-Speed 3426.47 samples/sec Loss 7.5196 LearningRate 0.0641 Epoch: 10 Global Step: 54870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:21,124-Speed 3433.71 samples/sec Loss 7.5341 LearningRate 0.0641 Epoch: 10 Global Step: 54880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:24,098-Speed 3443.80 samples/sec Loss 7.5755 LearningRate 0.0641 Epoch: 10 Global Step: 54890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:27,061-Speed 3457.75 samples/sec Loss 7.6095 LearningRate 0.0640 Epoch: 10 Global Step: 54900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:36:30,050-Speed 3425.90 samples/sec Loss 7.4709 LearningRate 0.0640 Epoch: 10 Global Step: 54910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:36:33,026-Speed 3441.72 samples/sec Loss 7.4620 LearningRate 0.0640 Epoch: 10 Global Step: 54920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:36:36,005-Speed 3438.92 samples/sec Loss 7.5731 LearningRate 0.0640 Epoch: 10 Global Step: 54930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:36:38,990-Speed 3431.27 samples/sec Loss 7.5513 LearningRate 0.0640 Epoch: 10 Global Step: 54940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:36:41,968-Speed 3438.84 samples/sec Loss 7.5985 LearningRate 0.0639 Epoch: 10 Global Step: 54950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:36:45,002-Speed 3376.49 samples/sec Loss 7.6827 LearningRate 0.0639 Epoch: 10 Global Step: 54960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:36:48,019-Speed 3395.19 samples/sec Loss 7.5623 LearningRate 0.0639 Epoch: 10 Global Step: 54970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:36:50,984-Speed 3455.23 samples/sec Loss 7.5661 LearningRate 0.0639 Epoch: 10 Global Step: 54980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:53,964-Speed 3437.28 samples/sec Loss 7.4745 LearningRate 0.0639 Epoch: 10 Global Step: 54990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:56,945-Speed 3435.45 samples/sec Loss 7.5757 LearningRate 0.0639 Epoch: 10 Global Step: 55000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:36:59,920-Speed 3442.53 samples/sec Loss 7.6528 LearningRate 0.0638 Epoch: 10 Global Step: 55010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:37:02,908-Speed 3428.33 samples/sec Loss 7.5192 LearningRate 0.0638 Epoch: 10 Global Step: 55020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:37:05,911-Speed 3410.89 samples/sec Loss 7.5853 LearningRate 0.0638 Epoch: 10 Global Step: 55030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:37:08,902-Speed 3424.20 samples/sec Loss 7.4745 LearningRate 0.0638 Epoch: 10 Global Step: 55040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:37:11,892-Speed 3426.06 samples/sec Loss 7.5430 LearningRate 0.0638 Epoch: 10 Global Step: 55050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:37:14,875-Speed 3433.53 samples/sec Loss 7.5771 LearningRate 0.0637 Epoch: 10 Global Step: 55060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:37:17,860-Speed 3433.02 samples/sec Loss 7.4163 LearningRate 0.0637 Epoch: 10 Global Step: 55070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:37:20,831-Speed 3447.23 samples/sec Loss 7.5017 LearningRate 0.0637 Epoch: 10 Global Step: 55080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:37:23,839-Speed 3404.73 samples/sec Loss 7.5400 LearningRate 0.0637 Epoch: 10 Global Step: 55090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:37:26,854-Speed 3397.43 samples/sec Loss 7.4861 LearningRate 0.0637 Epoch: 10 Global Step: 55100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:37:29,847-Speed 3422.53 samples/sec Loss 7.6072 LearningRate 0.0637 Epoch: 10 Global Step: 55110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:37:32,886-Speed 3370.85 samples/sec Loss 7.5807 LearningRate 0.0636 Epoch: 10 Global Step: 55120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:37:35,954-Speed 3337.94 samples/sec Loss 7.3664 LearningRate 0.0636 Epoch: 10 Global Step: 55130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:37:38,931-Speed 3440.65 samples/sec Loss 7.4841 LearningRate 0.0636 Epoch: 10 Global Step: 55140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:37:41,980-Speed 3359.81 samples/sec Loss 7.6168 LearningRate 0.0636 Epoch: 10 Global Step: 55150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:37:44,962-Speed 3435.09 samples/sec Loss 7.5126 LearningRate 0.0636 Epoch: 10 Global Step: 55160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:37:47,995-Speed 3376.70 samples/sec Loss 7.5258 LearningRate 0.0636 Epoch: 10 Global Step: 55170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:37:50,968-Speed 3445.79 samples/sec Loss 7.4481 LearningRate 0.0635 Epoch: 10 Global Step: 55180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:37:54,053-Speed 3319.54 samples/sec Loss 7.6353 LearningRate 0.0635 Epoch: 10 Global Step: 55190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:37:57,045-Speed 3423.34 samples/sec Loss 7.4787 LearningRate 0.0635 Epoch: 10 Global Step: 55200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:00,040-Speed 3420.75 samples/sec Loss 7.6763 LearningRate 0.0635 Epoch: 10 Global Step: 55210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:03,015-Speed 3442.49 samples/sec Loss 7.5174 LearningRate 0.0635 Epoch: 10 Global Step: 55220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:06,005-Speed 3426.15 samples/sec Loss 7.4482 LearningRate 0.0634 Epoch: 10 Global Step: 55230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:09,019-Speed 3398.16 samples/sec Loss 7.4105 LearningRate 0.0634 Epoch: 10 Global Step: 55240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:12,061-Speed 3367.88 samples/sec Loss 7.4872 LearningRate 0.0634 Epoch: 10 Global Step: 55250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:15,049-Speed 3428.10 samples/sec Loss 7.5225 LearningRate 0.0634 Epoch: 10 Global Step: 55260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:18,029-Speed 3436.25 samples/sec Loss 7.4523 LearningRate 0.0634 Epoch: 10 Global Step: 55270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:21,070-Speed 3368.71 samples/sec Loss 7.4755 LearningRate 0.0634 Epoch: 10 Global Step: 55280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:38:24,150-Speed 3325.62 samples/sec Loss 7.4504 LearningRate 0.0633 Epoch: 10 Global Step: 55290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:27,150-Speed 3414.60 samples/sec Loss 7.5966 LearningRate 0.0633 Epoch: 10 Global Step: 55300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:30,130-Speed 3436.80 samples/sec Loss 7.5228 LearningRate 0.0633 Epoch: 10 Global Step: 55310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:33,129-Speed 3416.43 samples/sec Loss 7.4835 LearningRate 0.0633 Epoch: 10 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:36,125-Speed 3418.35 samples/sec Loss 7.5125 LearningRate 0.0633 Epoch: 10 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:39,104-Speed 3437.89 samples/sec Loss 7.2875 LearningRate 0.0632 Epoch: 10 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:42,095-Speed 3424.59 samples/sec Loss 7.4065 LearningRate 0.0632 Epoch: 10 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:45,077-Speed 3435.36 samples/sec Loss 7.3956 LearningRate 0.0632 Epoch: 10 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:48,057-Speed 3436.90 samples/sec Loss 7.4126 LearningRate 0.0632 Epoch: 10 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:51,034-Speed 3440.43 samples/sec Loss 7.5467 LearningRate 0.0632 Epoch: 10 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:38:54,013-Speed 3438.36 samples/sec Loss 7.3809 LearningRate 0.0632 Epoch: 10 Global Step: 55390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:38:57,022-Speed 3404.66 samples/sec Loss 7.6650 LearningRate 0.0631 Epoch: 10 Global Step: 55400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:39:00,106-Speed 3320.45 samples/sec Loss 7.5378 LearningRate 0.0631 Epoch: 10 Global Step: 55410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:39:03,106-Speed 3414.48 samples/sec Loss 7.3990 LearningRate 0.0631 Epoch: 10 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:39:06,136-Speed 3381.14 samples/sec Loss 7.5662 LearningRate 0.0631 Epoch: 10 Global Step: 55430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:39:09,198-Speed 3344.95 samples/sec Loss 7.4411 LearningRate 0.0631 Epoch: 10 Global Step: 55440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:39:12,171-Speed 3445.50 samples/sec Loss 7.3075 LearningRate 0.0631 Epoch: 10 Global Step: 55450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:39:15,157-Speed 3429.85 samples/sec Loss 7.7357 LearningRate 0.0630 Epoch: 10 Global Step: 55460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:39:18,140-Speed 3433.12 samples/sec Loss 7.4776 LearningRate 0.0630 Epoch: 10 Global Step: 55470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:39:21,175-Speed 3375.21 samples/sec Loss 7.4173 LearningRate 0.0630 Epoch: 10 Global Step: 55480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:39:24,284-Speed 3293.89 samples/sec Loss 7.4882 LearningRate 0.0630 Epoch: 10 Global Step: 55490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:39:27,250-Speed 3453.73 samples/sec Loss 7.4837 LearningRate 0.0630 Epoch: 10 Global Step: 55500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:39:30,320-Speed 3336.78 samples/sec Loss 7.5695 LearningRate 0.0629 Epoch: 10 Global Step: 55510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:39:33,292-Speed 3446.87 samples/sec Loss 7.5677 LearningRate 0.0629 Epoch: 10 Global Step: 55520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:39:36,319-Speed 3383.17 samples/sec Loss 7.5623 LearningRate 0.0629 Epoch: 10 Global Step: 55530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:39:39,298-Speed 3438.68 samples/sec Loss 7.4885 LearningRate 0.0629 Epoch: 10 Global Step: 55540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:39:42,300-Speed 3411.68 samples/sec Loss 7.4224 LearningRate 0.0629 Epoch: 10 Global Step: 55550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:39:45,357-Speed 3350.78 samples/sec Loss 7.5170 LearningRate 0.0629 Epoch: 10 Global Step: 55560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:39:48,379-Speed 3389.44 samples/sec Loss 7.6093 LearningRate 0.0628 Epoch: 10 Global Step: 55570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:39:51,429-Speed 3358.05 samples/sec Loss 7.4160 LearningRate 0.0628 Epoch: 10 Global Step: 55580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:39:54,429-Speed 3414.66 samples/sec Loss 7.5563 LearningRate 0.0628 Epoch: 10 Global Step: 55590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:39:57,474-Speed 3364.13 samples/sec Loss 7.4551 LearningRate 0.0628 Epoch: 10 Global Step: 55600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:40:00,510-Speed 3373.35 samples/sec Loss 7.5232 LearningRate 0.0628 Epoch: 10 Global Step: 55610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:40:03,565-Speed 3353.24 samples/sec Loss 7.3031 LearningRate 0.0628 Epoch: 10 Global Step: 55620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:40:06,759-Speed 3206.73 samples/sec Loss 7.4235 LearningRate 0.0627 Epoch: 10 Global Step: 55630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:40:20,007-Speed 773.03 samples/sec Loss 7.3228 LearningRate 0.0627 Epoch: 11 Global Step: 55640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:40:23,008-Speed 3413.60 samples/sec Loss 6.8019 LearningRate 0.0627 Epoch: 11 Global Step: 55650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:40:26,016-Speed 3404.79 samples/sec Loss 6.8027 LearningRate 0.0627 Epoch: 11 Global Step: 55660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:40:28,999-Speed 3434.08 samples/sec Loss 6.6944 LearningRate 0.0627 Epoch: 11 Global Step: 55670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:40:31,977-Speed 3440.24 samples/sec Loss 6.7264 LearningRate 0.0626 Epoch: 11 Global Step: 55680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:40:35,001-Speed 3386.95 samples/sec Loss 6.8454 LearningRate 0.0626 Epoch: 11 Global Step: 55690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:40:37,994-Speed 3421.84 samples/sec Loss 6.7418 LearningRate 0.0626 Epoch: 11 Global Step: 55700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:40:40,975-Speed 3435.87 samples/sec Loss 6.7105 LearningRate 0.0626 Epoch: 11 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:40:43,987-Speed 3400.62 samples/sec Loss 6.8150 LearningRate 0.0626 Epoch: 11 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:40:47,046-Speed 3349.10 samples/sec Loss 6.6825 LearningRate 0.0626 Epoch: 11 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:40:50,126-Speed 3325.46 samples/sec Loss 6.7770 LearningRate 0.0625 Epoch: 11 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:40:53,172-Speed 3362.38 samples/sec Loss 6.8551 LearningRate 0.0625 Epoch: 11 Global Step: 55750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:40:56,157-Speed 3431.53 samples/sec Loss 6.8431 LearningRate 0.0625 Epoch: 11 Global Step: 55760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:40:59,286-Speed 3273.87 samples/sec Loss 6.7521 LearningRate 0.0625 Epoch: 11 Global Step: 55770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:41:02,274-Speed 3428.07 samples/sec Loss 6.9375 LearningRate 0.0625 Epoch: 11 Global Step: 55780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:41:05,282-Speed 3404.98 samples/sec Loss 6.8888 LearningRate 0.0625 Epoch: 11 Global Step: 55790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:41:08,302-Speed 3391.56 samples/sec Loss 6.7590 LearningRate 0.0624 Epoch: 11 Global Step: 55800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:41:11,317-Speed 3397.20 samples/sec Loss 6.8230 LearningRate 0.0624 Epoch: 11 Global Step: 55810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:41:14,312-Speed 3419.26 samples/sec Loss 6.8052 LearningRate 0.0624 Epoch: 11 Global Step: 55820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:41:17,290-Speed 3440.18 samples/sec Loss 6.9981 LearningRate 0.0624 Epoch: 11 Global Step: 55830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:20,287-Speed 3417.64 samples/sec Loss 7.0361 LearningRate 0.0624 Epoch: 11 Global Step: 55840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:23,270-Speed 3433.79 samples/sec Loss 6.8148 LearningRate 0.0623 Epoch: 11 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:26,257-Speed 3429.70 samples/sec Loss 6.8730 LearningRate 0.0623 Epoch: 11 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:29,272-Speed 3397.22 samples/sec Loss 6.9701 LearningRate 0.0623 Epoch: 11 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:32,291-Speed 3393.73 samples/sec Loss 6.8390 LearningRate 0.0623 Epoch: 11 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:35,286-Speed 3420.15 samples/sec Loss 6.9166 LearningRate 0.0623 Epoch: 11 Global Step: 55890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:38,278-Speed 3422.72 samples/sec Loss 6.9063 LearningRate 0.0623 Epoch: 11 Global Step: 55900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:41,261-Speed 3433.87 samples/sec Loss 6.9124 LearningRate 0.0622 Epoch: 11 Global Step: 55910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:44,270-Speed 3403.65 samples/sec Loss 6.9946 LearningRate 0.0622 Epoch: 11 Global Step: 55920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:47,263-Speed 3422.31 samples/sec Loss 7.0071 LearningRate 0.0622 Epoch: 11 Global Step: 55930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:41:50,233-Speed 3449.79 samples/sec Loss 6.9273 LearningRate 0.0622 Epoch: 11 Global Step: 55940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:53,234-Speed 3412.46 samples/sec Loss 7.0080 LearningRate 0.0622 Epoch: 11 Global Step: 55950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:56,218-Speed 3433.20 samples/sec Loss 7.2025 LearningRate 0.0622 Epoch: 11 Global Step: 55960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:41:59,194-Speed 3441.51 samples/sec Loss 6.9932 LearningRate 0.0621 Epoch: 11 Global Step: 55970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:42:02,183-Speed 3427.09 samples/sec Loss 7.0659 LearningRate 0.0621 Epoch: 11 Global Step: 55980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:42:05,160-Speed 3440.21 samples/sec Loss 7.0259 LearningRate 0.0621 Epoch: 11 Global Step: 55990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:42:08,154-Speed 3420.76 samples/sec Loss 7.1000 LearningRate 0.0621 Epoch: 11 Global Step: 56000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:42:51,128-[lfw][56000]XNorm: 21.668159 Training: 2022-01-19 22:42:51,129-[lfw][56000]Accuracy-Flip: 0.99650+-0.00345 Training: 2022-01-19 22:42:51,129-[lfw][56000]Accuracy-Highest: 0.99767 Training: 2022-01-19 22:43:41,189-[cfp_fp][56000]XNorm: 19.305356 Training: 2022-01-19 22:43:41,190-[cfp_fp][56000]Accuracy-Flip: 0.96829+-0.00731 Training: 2022-01-19 22:43:41,190-[cfp_fp][56000]Accuracy-Highest: 0.96829 Training: 2022-01-19 22:44:24,602-[agedb_30][56000]XNorm: 21.460912 Training: 2022-01-19 22:44:24,602-[agedb_30][56000]Accuracy-Flip: 0.97567+-0.00834 Training: 2022-01-19 22:44:24,603-[agedb_30][56000]Accuracy-Highest: 0.97667 Training: 2022-01-19 22:44:27,616-Speed 73.43 samples/sec Loss 7.1343 LearningRate 0.0621 Epoch: 11 Global Step: 56010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:44:30,588-Speed 3446.73 samples/sec Loss 7.0819 LearningRate 0.0620 Epoch: 11 Global Step: 56020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:44:33,558-Speed 3448.81 samples/sec Loss 6.9800 LearningRate 0.0620 Epoch: 11 Global Step: 56030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:44:36,538-Speed 3437.06 samples/sec Loss 7.2466 LearningRate 0.0620 Epoch: 11 Global Step: 56040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:44:39,484-Speed 3475.94 samples/sec Loss 7.0855 LearningRate 0.0620 Epoch: 11 Global Step: 56050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:44:42,464-Speed 3437.93 samples/sec Loss 7.0686 LearningRate 0.0620 Epoch: 11 Global Step: 56060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:44:45,465-Speed 3413.42 samples/sec Loss 7.1310 LearningRate 0.0620 Epoch: 11 Global Step: 56070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:44:48,458-Speed 3422.18 samples/sec Loss 7.1533 LearningRate 0.0619 Epoch: 11 Global Step: 56080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:44:51,633-Speed 3226.41 samples/sec Loss 7.0256 LearningRate 0.0619 Epoch: 11 Global Step: 56090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:44:54,811-Speed 3222.74 samples/sec Loss 7.1594 LearningRate 0.0619 Epoch: 11 Global Step: 56100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:44:57,784-Speed 3445.18 samples/sec Loss 7.1129 LearningRate 0.0619 Epoch: 11 Global Step: 56110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:00,814-Speed 3380.86 samples/sec Loss 7.1679 LearningRate 0.0619 Epoch: 11 Global Step: 56120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:03,838-Speed 3386.49 samples/sec Loss 7.1241 LearningRate 0.0619 Epoch: 11 Global Step: 56130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:06,811-Speed 3445.77 samples/sec Loss 7.0816 LearningRate 0.0618 Epoch: 11 Global Step: 56140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:09,790-Speed 3438.71 samples/sec Loss 6.9756 LearningRate 0.0618 Epoch: 11 Global Step: 56150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:12,818-Speed 3382.58 samples/sec Loss 7.1478 LearningRate 0.0618 Epoch: 11 Global Step: 56160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:15,869-Speed 3356.68 samples/sec Loss 7.1841 LearningRate 0.0618 Epoch: 11 Global Step: 56170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:18,913-Speed 3365.09 samples/sec Loss 7.1418 LearningRate 0.0618 Epoch: 11 Global Step: 56180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:21,999-Speed 3320.15 samples/sec Loss 7.0850 LearningRate 0.0617 Epoch: 11 Global Step: 56190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:25,042-Speed 3365.83 samples/sec Loss 7.2639 LearningRate 0.0617 Epoch: 11 Global Step: 56200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:28,057-Speed 3397.08 samples/sec Loss 7.2393 LearningRate 0.0617 Epoch: 11 Global Step: 56210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:31,090-Speed 3377.36 samples/sec Loss 7.1794 LearningRate 0.0617 Epoch: 11 Global Step: 56220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:34,089-Speed 3414.69 samples/sec Loss 7.1808 LearningRate 0.0617 Epoch: 11 Global Step: 56230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:37,133-Speed 3365.67 samples/sec Loss 6.9697 LearningRate 0.0617 Epoch: 11 Global Step: 56240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:40,132-Speed 3415.50 samples/sec Loss 7.0649 LearningRate 0.0616 Epoch: 11 Global Step: 56250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:45:43,121-Speed 3425.99 samples/sec Loss 7.1638 LearningRate 0.0616 Epoch: 11 Global Step: 56260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:45:46,169-Speed 3360.90 samples/sec Loss 7.1646 LearningRate 0.0616 Epoch: 11 Global Step: 56270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:45:49,162-Speed 3421.96 samples/sec Loss 7.0865 LearningRate 0.0616 Epoch: 11 Global Step: 56280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:45:52,152-Speed 3426.34 samples/sec Loss 7.0329 LearningRate 0.0616 Epoch: 11 Global Step: 56290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:55,128-Speed 3441.48 samples/sec Loss 7.2361 LearningRate 0.0616 Epoch: 11 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:45:58,209-Speed 3324.29 samples/sec Loss 7.2874 LearningRate 0.0615 Epoch: 11 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:01,188-Speed 3438.31 samples/sec Loss 7.1099 LearningRate 0.0615 Epoch: 11 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:04,188-Speed 3414.74 samples/sec Loss 7.0792 LearningRate 0.0615 Epoch: 11 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:07,203-Speed 3398.55 samples/sec Loss 7.0260 LearningRate 0.0615 Epoch: 11 Global Step: 56340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:10,201-Speed 3415.42 samples/sec Loss 7.0654 LearningRate 0.0615 Epoch: 11 Global Step: 56350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:13,202-Speed 3413.14 samples/sec Loss 7.1268 LearningRate 0.0614 Epoch: 11 Global Step: 56360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:16,181-Speed 3439.40 samples/sec Loss 7.2299 LearningRate 0.0614 Epoch: 11 Global Step: 56370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:19,186-Speed 3408.50 samples/sec Loss 7.3210 LearningRate 0.0614 Epoch: 11 Global Step: 56380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:22,222-Speed 3373.85 samples/sec Loss 7.0793 LearningRate 0.0614 Epoch: 11 Global Step: 56390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:46:25,216-Speed 3420.34 samples/sec Loss 7.1467 LearningRate 0.0614 Epoch: 11 Global Step: 56400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:46:28,201-Speed 3431.80 samples/sec Loss 7.0040 LearningRate 0.0614 Epoch: 11 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:46:31,218-Speed 3395.43 samples/sec Loss 7.2412 LearningRate 0.0613 Epoch: 11 Global Step: 56420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:46:34,242-Speed 3386.29 samples/sec Loss 7.1876 LearningRate 0.0613 Epoch: 11 Global Step: 56430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:46:37,229-Speed 3429.37 samples/sec Loss 7.1352 LearningRate 0.0613 Epoch: 11 Global Step: 56440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:46:40,204-Speed 3443.04 samples/sec Loss 7.2540 LearningRate 0.0613 Epoch: 11 Global Step: 56450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:46:43,163-Speed 3461.87 samples/sec Loss 7.4233 LearningRate 0.0613 Epoch: 11 Global Step: 56460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:46,138-Speed 3443.58 samples/sec Loss 7.2540 LearningRate 0.0613 Epoch: 11 Global Step: 56470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:49,123-Speed 3431.39 samples/sec Loss 7.1853 LearningRate 0.0612 Epoch: 11 Global Step: 56480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:52,126-Speed 3410.13 samples/sec Loss 7.0187 LearningRate 0.0612 Epoch: 11 Global Step: 56490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:55,108-Speed 3435.47 samples/sec Loss 7.1498 LearningRate 0.0612 Epoch: 11 Global Step: 56500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:46:58,089-Speed 3435.92 samples/sec Loss 7.2802 LearningRate 0.0612 Epoch: 11 Global Step: 56510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:47:01,110-Speed 3390.04 samples/sec Loss 7.3168 LearningRate 0.0612 Epoch: 11 Global Step: 56520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:47:04,092-Speed 3434.49 samples/sec Loss 7.3292 LearningRate 0.0611 Epoch: 11 Global Step: 56530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:47:07,072-Speed 3437.31 samples/sec Loss 7.1968 LearningRate 0.0611 Epoch: 11 Global Step: 56540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:47:10,102-Speed 3381.62 samples/sec Loss 7.0886 LearningRate 0.0611 Epoch: 11 Global Step: 56550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:47:13,079-Speed 3440.50 samples/sec Loss 7.2351 LearningRate 0.0611 Epoch: 11 Global Step: 56560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:16,062-Speed 3434.28 samples/sec Loss 7.2926 LearningRate 0.0611 Epoch: 11 Global Step: 56570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:19,051-Speed 3426.26 samples/sec Loss 7.2408 LearningRate 0.0611 Epoch: 11 Global Step: 56580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:22,034-Speed 3434.20 samples/sec Loss 7.3348 LearningRate 0.0610 Epoch: 11 Global Step: 56590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:25,010-Speed 3440.61 samples/sec Loss 7.3612 LearningRate 0.0610 Epoch: 11 Global Step: 56600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:28,004-Speed 3421.41 samples/sec Loss 7.2091 LearningRate 0.0610 Epoch: 11 Global Step: 56610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:31,092-Speed 3317.16 samples/sec Loss 7.0916 LearningRate 0.0610 Epoch: 11 Global Step: 56620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:34,092-Speed 3414.53 samples/sec Loss 7.1838 LearningRate 0.0610 Epoch: 11 Global Step: 56630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:37,076-Speed 3433.10 samples/sec Loss 7.2219 LearningRate 0.0610 Epoch: 11 Global Step: 56640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:40,209-Speed 3269.11 samples/sec Loss 7.3665 LearningRate 0.0609 Epoch: 11 Global Step: 56650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:43,190-Speed 3435.89 samples/sec Loss 7.1293 LearningRate 0.0609 Epoch: 11 Global Step: 56660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:46,172-Speed 3434.41 samples/sec Loss 7.3082 LearningRate 0.0609 Epoch: 11 Global Step: 56670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:49,156-Speed 3432.48 samples/sec Loss 7.2003 LearningRate 0.0609 Epoch: 11 Global Step: 56680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:52,146-Speed 3425.77 samples/sec Loss 7.1299 LearningRate 0.0609 Epoch: 11 Global Step: 56690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:55,131-Speed 3431.36 samples/sec Loss 7.1779 LearningRate 0.0609 Epoch: 11 Global Step: 56700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:47:58,113-Speed 3434.32 samples/sec Loss 7.1619 LearningRate 0.0608 Epoch: 11 Global Step: 56710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:48:01,089-Speed 3442.80 samples/sec Loss 7.2278 LearningRate 0.0608 Epoch: 11 Global Step: 56720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:04,121-Speed 3378.24 samples/sec Loss 7.3507 LearningRate 0.0608 Epoch: 11 Global Step: 56730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:07,098-Speed 3440.33 samples/sec Loss 7.1248 LearningRate 0.0608 Epoch: 11 Global Step: 56740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:10,089-Speed 3425.10 samples/sec Loss 7.2161 LearningRate 0.0608 Epoch: 11 Global Step: 56750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:13,067-Speed 3438.76 samples/sec Loss 7.2370 LearningRate 0.0607 Epoch: 11 Global Step: 56760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:16,062-Speed 3420.00 samples/sec Loss 7.3077 LearningRate 0.0607 Epoch: 11 Global Step: 56770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:19,060-Speed 3416.46 samples/sec Loss 7.2360 LearningRate 0.0607 Epoch: 11 Global Step: 56780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:22,180-Speed 3283.25 samples/sec Loss 7.2275 LearningRate 0.0607 Epoch: 11 Global Step: 56790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:25,231-Speed 3357.28 samples/sec Loss 7.2178 LearningRate 0.0607 Epoch: 11 Global Step: 56800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:28,311-Speed 3325.04 samples/sec Loss 7.3647 LearningRate 0.0607 Epoch: 11 Global Step: 56810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:31,313-Speed 3412.86 samples/sec Loss 7.2204 LearningRate 0.0606 Epoch: 11 Global Step: 56820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:48:34,301-Speed 3427.59 samples/sec Loss 7.2746 LearningRate 0.0606 Epoch: 11 Global Step: 56830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:48:37,289-Speed 3428.41 samples/sec Loss 7.4028 LearningRate 0.0606 Epoch: 11 Global Step: 56840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:48:40,259-Speed 3448.64 samples/sec Loss 7.1423 LearningRate 0.0606 Epoch: 11 Global Step: 56850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:43,247-Speed 3427.68 samples/sec Loss 7.1924 LearningRate 0.0606 Epoch: 11 Global Step: 56860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:46,238-Speed 3423.81 samples/sec Loss 7.3011 LearningRate 0.0606 Epoch: 11 Global Step: 56870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:49,227-Speed 3427.33 samples/sec Loss 7.3089 LearningRate 0.0605 Epoch: 11 Global Step: 56880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:52,235-Speed 3405.47 samples/sec Loss 7.1498 LearningRate 0.0605 Epoch: 11 Global Step: 56890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:55,323-Speed 3316.72 samples/sec Loss 7.3663 LearningRate 0.0605 Epoch: 11 Global Step: 56900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:48:58,372-Speed 3358.83 samples/sec Loss 7.2473 LearningRate 0.0605 Epoch: 11 Global Step: 56910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:49:01,367-Speed 3420.70 samples/sec Loss 7.2424 LearningRate 0.0605 Epoch: 11 Global Step: 56920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:49:04,370-Speed 3409.93 samples/sec Loss 7.2518 LearningRate 0.0605 Epoch: 11 Global Step: 56930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:49:07,349-Speed 3438.52 samples/sec Loss 7.3008 LearningRate 0.0604 Epoch: 11 Global Step: 56940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:49:10,366-Speed 3395.13 samples/sec Loss 7.1837 LearningRate 0.0604 Epoch: 11 Global Step: 56950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:13,349-Speed 3433.67 samples/sec Loss 7.2364 LearningRate 0.0604 Epoch: 11 Global Step: 56960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:16,333-Speed 3432.67 samples/sec Loss 7.1113 LearningRate 0.0604 Epoch: 11 Global Step: 56970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:19,317-Speed 3433.08 samples/sec Loss 7.2355 LearningRate 0.0604 Epoch: 11 Global Step: 56980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:22,323-Speed 3407.25 samples/sec Loss 7.1231 LearningRate 0.0603 Epoch: 11 Global Step: 56990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:25,318-Speed 3420.27 samples/sec Loss 7.3704 LearningRate 0.0603 Epoch: 11 Global Step: 57000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:28,305-Speed 3428.69 samples/sec Loss 7.3833 LearningRate 0.0603 Epoch: 11 Global Step: 57010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:31,323-Speed 3394.16 samples/sec Loss 7.3239 LearningRate 0.0603 Epoch: 11 Global Step: 57020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:34,318-Speed 3420.53 samples/sec Loss 7.2176 LearningRate 0.0603 Epoch: 11 Global Step: 57030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:37,306-Speed 3427.72 samples/sec Loss 7.3054 LearningRate 0.0603 Epoch: 11 Global Step: 57040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:40,286-Speed 3436.81 samples/sec Loss 7.1960 LearningRate 0.0602 Epoch: 11 Global Step: 57050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:49:43,351-Speed 3341.95 samples/sec Loss 7.2086 LearningRate 0.0602 Epoch: 11 Global Step: 57060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:46,395-Speed 3365.31 samples/sec Loss 7.1709 LearningRate 0.0602 Epoch: 11 Global Step: 57070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:49,384-Speed 3426.00 samples/sec Loss 7.1280 LearningRate 0.0602 Epoch: 11 Global Step: 57080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:52,366-Speed 3435.64 samples/sec Loss 7.2942 LearningRate 0.0602 Epoch: 11 Global Step: 57090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:55,362-Speed 3418.31 samples/sec Loss 7.2707 LearningRate 0.0602 Epoch: 11 Global Step: 57100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:49:58,391-Speed 3381.71 samples/sec Loss 7.1751 LearningRate 0.0601 Epoch: 11 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:01,373-Speed 3434.91 samples/sec Loss 7.3867 LearningRate 0.0601 Epoch: 11 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:04,364-Speed 3424.78 samples/sec Loss 7.2523 LearningRate 0.0601 Epoch: 11 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:07,336-Speed 3446.11 samples/sec Loss 7.1900 LearningRate 0.0601 Epoch: 11 Global Step: 57140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:10,363-Speed 3383.84 samples/sec Loss 7.2576 LearningRate 0.0601 Epoch: 11 Global Step: 57150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:13,455-Speed 3312.05 samples/sec Loss 7.2035 LearningRate 0.0601 Epoch: 11 Global Step: 57160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:50:16,492-Speed 3372.50 samples/sec Loss 7.3656 LearningRate 0.0600 Epoch: 11 Global Step: 57170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:50:19,602-Speed 3293.61 samples/sec Loss 7.2368 LearningRate 0.0600 Epoch: 11 Global Step: 57180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:50:22,671-Speed 3338.24 samples/sec Loss 7.1666 LearningRate 0.0600 Epoch: 11 Global Step: 57190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:50:25,656-Speed 3430.80 samples/sec Loss 7.3755 LearningRate 0.0600 Epoch: 11 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:50:28,668-Speed 3400.77 samples/sec Loss 7.2751 LearningRate 0.0600 Epoch: 11 Global Step: 57210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:50:31,739-Speed 3335.61 samples/sec Loss 7.2839 LearningRate 0.0599 Epoch: 11 Global Step: 57220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:50:34,731-Speed 3423.47 samples/sec Loss 7.2236 LearningRate 0.0599 Epoch: 11 Global Step: 57230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:50:37,702-Speed 3447.71 samples/sec Loss 7.1138 LearningRate 0.0599 Epoch: 11 Global Step: 57240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:40,687-Speed 3431.27 samples/sec Loss 7.3509 LearningRate 0.0599 Epoch: 11 Global Step: 57250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:43,702-Speed 3396.80 samples/sec Loss 7.2747 LearningRate 0.0599 Epoch: 11 Global Step: 57260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:46,702-Speed 3414.83 samples/sec Loss 7.2499 LearningRate 0.0599 Epoch: 11 Global Step: 57270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:49,682-Speed 3437.75 samples/sec Loss 7.1197 LearningRate 0.0598 Epoch: 11 Global Step: 57280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:52,665-Speed 3438.37 samples/sec Loss 7.2243 LearningRate 0.0598 Epoch: 11 Global Step: 57290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:55,681-Speed 3396.82 samples/sec Loss 7.3468 LearningRate 0.0598 Epoch: 11 Global Step: 57300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:50:58,741-Speed 3346.34 samples/sec Loss 7.1083 LearningRate 0.0598 Epoch: 11 Global Step: 57310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:01,739-Speed 3416.44 samples/sec Loss 7.2242 LearningRate 0.0598 Epoch: 11 Global Step: 57320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:04,745-Speed 3407.52 samples/sec Loss 7.2381 LearningRate 0.0598 Epoch: 11 Global Step: 57330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:07,762-Speed 3395.35 samples/sec Loss 7.2136 LearningRate 0.0597 Epoch: 11 Global Step: 57340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:51:10,758-Speed 3419.11 samples/sec Loss 7.4759 LearningRate 0.0597 Epoch: 11 Global Step: 57350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:51:13,782-Speed 3386.47 samples/sec Loss 7.2959 LearningRate 0.0597 Epoch: 11 Global Step: 57360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:51:16,771-Speed 3427.49 samples/sec Loss 7.3113 LearningRate 0.0597 Epoch: 11 Global Step: 57370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:51:19,813-Speed 3366.36 samples/sec Loss 7.0746 LearningRate 0.0597 Epoch: 11 Global Step: 57380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:51:22,844-Speed 3379.72 samples/sec Loss 7.2833 LearningRate 0.0597 Epoch: 11 Global Step: 57390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:51:25,825-Speed 3435.93 samples/sec Loss 7.2613 LearningRate 0.0596 Epoch: 11 Global Step: 57400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:28,878-Speed 3354.50 samples/sec Loss 7.2042 LearningRate 0.0596 Epoch: 11 Global Step: 57410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:31,866-Speed 3428.58 samples/sec Loss 7.2088 LearningRate 0.0596 Epoch: 11 Global Step: 57420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:34,850-Speed 3431.94 samples/sec Loss 7.2309 LearningRate 0.0596 Epoch: 11 Global Step: 57430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:37,846-Speed 3419.60 samples/sec Loss 7.3586 LearningRate 0.0596 Epoch: 11 Global Step: 57440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:40,830-Speed 3432.01 samples/sec Loss 7.3189 LearningRate 0.0596 Epoch: 11 Global Step: 57450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:43,856-Speed 3384.65 samples/sec Loss 7.2901 LearningRate 0.0595 Epoch: 11 Global Step: 57460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:46,866-Speed 3403.62 samples/sec Loss 7.2048 LearningRate 0.0595 Epoch: 11 Global Step: 57470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:49,865-Speed 3415.12 samples/sec Loss 7.3888 LearningRate 0.0595 Epoch: 11 Global Step: 57480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:52,880-Speed 3398.03 samples/sec Loss 7.2063 LearningRate 0.0595 Epoch: 11 Global Step: 57490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:51:55,876-Speed 3417.57 samples/sec Loss 7.2753 LearningRate 0.0595 Epoch: 11 Global Step: 57500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:51:58,857-Speed 3436.02 samples/sec Loss 7.2531 LearningRate 0.0594 Epoch: 11 Global Step: 57510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:52:01,809-Speed 3470.19 samples/sec Loss 7.3254 LearningRate 0.0594 Epoch: 11 Global Step: 57520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:52:04,900-Speed 3313.58 samples/sec Loss 7.3327 LearningRate 0.0594 Epoch: 11 Global Step: 57530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:52:08,001-Speed 3303.96 samples/sec Loss 6.9684 LearningRate 0.0594 Epoch: 11 Global Step: 57540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:52:10,979-Speed 3439.70 samples/sec Loss 7.1889 LearningRate 0.0594 Epoch: 11 Global Step: 57550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:52:13,966-Speed 3429.15 samples/sec Loss 7.1177 LearningRate 0.0594 Epoch: 11 Global Step: 57560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:52:16,951-Speed 3431.16 samples/sec Loss 7.2326 LearningRate 0.0593 Epoch: 11 Global Step: 57570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:52:19,932-Speed 3436.10 samples/sec Loss 7.3063 LearningRate 0.0593 Epoch: 11 Global Step: 57580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:52:22,916-Speed 3432.47 samples/sec Loss 7.2348 LearningRate 0.0593 Epoch: 11 Global Step: 57590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:52:25,900-Speed 3432.29 samples/sec Loss 7.2415 LearningRate 0.0593 Epoch: 11 Global Step: 57600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:52:28,918-Speed 3393.27 samples/sec Loss 7.0805 LearningRate 0.0593 Epoch: 11 Global Step: 57610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:52:32,055-Speed 3266.15 samples/sec Loss 7.1881 LearningRate 0.0593 Epoch: 11 Global Step: 57620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:52:35,064-Speed 3402.70 samples/sec Loss 7.3007 LearningRate 0.0592 Epoch: 11 Global Step: 57630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:52:38,053-Speed 3428.34 samples/sec Loss 7.1881 LearningRate 0.0592 Epoch: 11 Global Step: 57640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:52:41,040-Speed 3429.15 samples/sec Loss 7.3545 LearningRate 0.0592 Epoch: 11 Global Step: 57650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:52:44,033-Speed 3421.99 samples/sec Loss 7.2165 LearningRate 0.0592 Epoch: 11 Global Step: 57660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:52:47,025-Speed 3423.94 samples/sec Loss 7.2806 LearningRate 0.0592 Epoch: 11 Global Step: 57670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:52:50,095-Speed 3335.91 samples/sec Loss 7.3151 LearningRate 0.0592 Epoch: 11 Global Step: 57680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:52:53,125-Speed 3380.91 samples/sec Loss 7.1448 LearningRate 0.0591 Epoch: 11 Global Step: 57690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:52:56,152-Speed 3383.14 samples/sec Loss 7.2055 LearningRate 0.0591 Epoch: 11 Global Step: 57700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:52:59,161-Speed 3404.16 samples/sec Loss 7.3010 LearningRate 0.0591 Epoch: 11 Global Step: 57710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:53:02,156-Speed 3419.61 samples/sec Loss 7.2632 LearningRate 0.0591 Epoch: 11 Global Step: 57720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:53:05,170-Speed 3398.76 samples/sec Loss 7.2251 LearningRate 0.0591 Epoch: 11 Global Step: 57730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:53:08,161-Speed 3425.11 samples/sec Loss 7.3480 LearningRate 0.0591 Epoch: 11 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:53:11,162-Speed 3413.57 samples/sec Loss 7.2400 LearningRate 0.0590 Epoch: 11 Global Step: 57750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:53:14,156-Speed 3420.50 samples/sec Loss 7.2086 LearningRate 0.0590 Epoch: 11 Global Step: 57760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:53:17,139-Speed 3434.82 samples/sec Loss 7.2948 LearningRate 0.0590 Epoch: 11 Global Step: 57770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:53:20,124-Speed 3431.73 samples/sec Loss 7.2536 LearningRate 0.0590 Epoch: 11 Global Step: 57780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:53:23,137-Speed 3399.04 samples/sec Loss 7.2582 LearningRate 0.0590 Epoch: 11 Global Step: 57790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:53:26,122-Speed 3431.58 samples/sec Loss 7.2231 LearningRate 0.0589 Epoch: 11 Global Step: 57800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:53:29,113-Speed 3424.72 samples/sec Loss 7.2024 LearningRate 0.0589 Epoch: 11 Global Step: 57810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:53:32,163-Speed 3358.58 samples/sec Loss 7.1566 LearningRate 0.0589 Epoch: 11 Global Step: 57820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:53:35,219-Speed 3351.68 samples/sec Loss 7.2387 LearningRate 0.0589 Epoch: 11 Global Step: 57830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:53:38,200-Speed 3436.40 samples/sec Loss 7.3885 LearningRate 0.0589 Epoch: 11 Global Step: 57840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:53:41,187-Speed 3428.80 samples/sec Loss 7.1783 LearningRate 0.0589 Epoch: 11 Global Step: 57850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:53:44,189-Speed 3411.23 samples/sec Loss 7.1398 LearningRate 0.0588 Epoch: 11 Global Step: 57860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:53:47,197-Speed 3406.01 samples/sec Loss 7.1682 LearningRate 0.0588 Epoch: 11 Global Step: 57870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:53:50,181-Speed 3431.49 samples/sec Loss 7.1969 LearningRate 0.0588 Epoch: 11 Global Step: 57880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:53:53,164-Speed 3434.15 samples/sec Loss 7.2477 LearningRate 0.0588 Epoch: 11 Global Step: 57890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:53:56,173-Speed 3404.16 samples/sec Loss 7.2988 LearningRate 0.0588 Epoch: 11 Global Step: 57900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:53:59,156-Speed 3432.81 samples/sec Loss 7.1778 LearningRate 0.0588 Epoch: 11 Global Step: 57910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:54:02,155-Speed 3416.44 samples/sec Loss 7.2159 LearningRate 0.0587 Epoch: 11 Global Step: 57920 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-19 22:54:05,123-Speed 3451.56 samples/sec Loss 7.2881 LearningRate 0.0587 Epoch: 11 Global Step: 57930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:54:08,108-Speed 3431.29 samples/sec Loss 7.2720 LearningRate 0.0587 Epoch: 11 Global Step: 57940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:54:11,090-Speed 3434.64 samples/sec Loss 7.3096 LearningRate 0.0587 Epoch: 11 Global Step: 57950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:54:14,104-Speed 3398.44 samples/sec Loss 7.2632 LearningRate 0.0587 Epoch: 11 Global Step: 57960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:54:17,073-Speed 3450.77 samples/sec Loss 7.2755 LearningRate 0.0587 Epoch: 11 Global Step: 57970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:54:20,068-Speed 3419.43 samples/sec Loss 7.2600 LearningRate 0.0586 Epoch: 11 Global Step: 57980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:54:23,052-Speed 3433.20 samples/sec Loss 7.2819 LearningRate 0.0586 Epoch: 11 Global Step: 57990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:54:26,059-Speed 3405.98 samples/sec Loss 7.2830 LearningRate 0.0586 Epoch: 11 Global Step: 58000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:55:09,354-[lfw][58000]XNorm: 23.046968 Training: 2022-01-19 22:55:09,355-[lfw][58000]Accuracy-Flip: 0.99750+-0.00352 Training: 2022-01-19 22:55:09,355-[lfw][58000]Accuracy-Highest: 0.99767 Training: 2022-01-19 22:55:59,872-[cfp_fp][58000]XNorm: 20.160567 Training: 2022-01-19 22:55:59,873-[cfp_fp][58000]Accuracy-Flip: 0.96357+-0.01113 Training: 2022-01-19 22:55:59,873-[cfp_fp][58000]Accuracy-Highest: 0.96829 Training: 2022-01-19 22:56:43,115-[agedb_30][58000]XNorm: 22.513089 Training: 2022-01-19 22:56:43,116-[agedb_30][58000]Accuracy-Flip: 0.97300+-0.00806 Training: 2022-01-19 22:56:43,117-[agedb_30][58000]Accuracy-Highest: 0.97667 Training: 2022-01-19 22:56:46,128-Speed 73.11 samples/sec Loss 7.2487 LearningRate 0.0586 Epoch: 11 Global Step: 58010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:56:49,127-Speed 3415.14 samples/sec Loss 7.3795 LearningRate 0.0586 Epoch: 11 Global Step: 58020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:56:52,130-Speed 3410.91 samples/sec Loss 7.3214 LearningRate 0.0586 Epoch: 11 Global Step: 58030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:56:55,115-Speed 3432.36 samples/sec Loss 7.1649 LearningRate 0.0585 Epoch: 11 Global Step: 58040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:56:58,094-Speed 3438.28 samples/sec Loss 7.1678 LearningRate 0.0585 Epoch: 11 Global Step: 58050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:57:01,201-Speed 3296.33 samples/sec Loss 7.2981 LearningRate 0.0585 Epoch: 11 Global Step: 58060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:57:04,224-Speed 3388.71 samples/sec Loss 7.2650 LearningRate 0.0585 Epoch: 11 Global Step: 58070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:57:07,210-Speed 3430.65 samples/sec Loss 7.2327 LearningRate 0.0585 Epoch: 11 Global Step: 58080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:57:10,191-Speed 3437.48 samples/sec Loss 7.4053 LearningRate 0.0585 Epoch: 11 Global Step: 58090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:57:13,219-Speed 3381.87 samples/sec Loss 7.0350 LearningRate 0.0584 Epoch: 11 Global Step: 58100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:57:16,190-Speed 3447.94 samples/sec Loss 7.2012 LearningRate 0.0584 Epoch: 11 Global Step: 58110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:57:19,175-Speed 3431.17 samples/sec Loss 7.1781 LearningRate 0.0584 Epoch: 11 Global Step: 58120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:57:22,157-Speed 3434.94 samples/sec Loss 7.4450 LearningRate 0.0584 Epoch: 11 Global Step: 58130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:57:25,138-Speed 3436.73 samples/sec Loss 7.2154 LearningRate 0.0584 Epoch: 11 Global Step: 58140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:57:28,126-Speed 3427.09 samples/sec Loss 7.2424 LearningRate 0.0583 Epoch: 11 Global Step: 58150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:57:31,118-Speed 3423.79 samples/sec Loss 7.2625 LearningRate 0.0583 Epoch: 11 Global Step: 58160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:57:34,121-Speed 3410.60 samples/sec Loss 7.1869 LearningRate 0.0583 Epoch: 11 Global Step: 58170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:57:37,120-Speed 3415.28 samples/sec Loss 7.1426 LearningRate 0.0583 Epoch: 11 Global Step: 58180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:57:40,111-Speed 3425.10 samples/sec Loss 7.2401 LearningRate 0.0583 Epoch: 11 Global Step: 58190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:57:43,157-Speed 3362.33 samples/sec Loss 7.2924 LearningRate 0.0583 Epoch: 11 Global Step: 58200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:57:46,149-Speed 3423.50 samples/sec Loss 7.1346 LearningRate 0.0582 Epoch: 11 Global Step: 58210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:57:49,228-Speed 3325.59 samples/sec Loss 7.0196 LearningRate 0.0582 Epoch: 11 Global Step: 58220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:57:52,319-Speed 3314.20 samples/sec Loss 7.1153 LearningRate 0.0582 Epoch: 11 Global Step: 58230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:57:55,319-Speed 3414.90 samples/sec Loss 7.1331 LearningRate 0.0582 Epoch: 11 Global Step: 58240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:57:58,370-Speed 3357.14 samples/sec Loss 7.1099 LearningRate 0.0582 Epoch: 11 Global Step: 58250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:58:01,391-Speed 3389.84 samples/sec Loss 7.2309 LearningRate 0.0582 Epoch: 11 Global Step: 58260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:58:04,402-Speed 3402.70 samples/sec Loss 7.2613 LearningRate 0.0581 Epoch: 11 Global Step: 58270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:58:07,385-Speed 3434.13 samples/sec Loss 7.2630 LearningRate 0.0581 Epoch: 11 Global Step: 58280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:58:10,384-Speed 3415.28 samples/sec Loss 7.2257 LearningRate 0.0581 Epoch: 11 Global Step: 58290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:58:13,397-Speed 3399.81 samples/sec Loss 7.2641 LearningRate 0.0581 Epoch: 11 Global Step: 58300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:58:16,376-Speed 3437.80 samples/sec Loss 7.1203 LearningRate 0.0581 Epoch: 11 Global Step: 58310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:19,390-Speed 3400.06 samples/sec Loss 7.2462 LearningRate 0.0581 Epoch: 11 Global Step: 58320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:22,431-Speed 3368.41 samples/sec Loss 7.2404 LearningRate 0.0580 Epoch: 11 Global Step: 58330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:25,482-Speed 3356.89 samples/sec Loss 7.1306 LearningRate 0.0580 Epoch: 11 Global Step: 58340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:28,477-Speed 3420.24 samples/sec Loss 7.2808 LearningRate 0.0580 Epoch: 11 Global Step: 58350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:31,464-Speed 3429.05 samples/sec Loss 7.1086 LearningRate 0.0580 Epoch: 11 Global Step: 58360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:34,517-Speed 3354.88 samples/sec Loss 7.2295 LearningRate 0.0580 Epoch: 11 Global Step: 58370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:37,505-Speed 3427.27 samples/sec Loss 7.2409 LearningRate 0.0580 Epoch: 11 Global Step: 58380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:40,492-Speed 3429.90 samples/sec Loss 7.1088 LearningRate 0.0579 Epoch: 11 Global Step: 58390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:43,482-Speed 3425.11 samples/sec Loss 7.3306 LearningRate 0.0579 Epoch: 11 Global Step: 58400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:46,453-Speed 3447.44 samples/sec Loss 7.2413 LearningRate 0.0579 Epoch: 11 Global Step: 58410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:49,440-Speed 3429.68 samples/sec Loss 7.0623 LearningRate 0.0579 Epoch: 11 Global Step: 58420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:52,423-Speed 3433.84 samples/sec Loss 7.0792 LearningRate 0.0579 Epoch: 11 Global Step: 58430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 22:58:55,425-Speed 3411.92 samples/sec Loss 7.2150 LearningRate 0.0579 Epoch: 11 Global Step: 58440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:58:58,404-Speed 3438.20 samples/sec Loss 7.2295 LearningRate 0.0578 Epoch: 11 Global Step: 58450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:59:01,381-Speed 3439.62 samples/sec Loss 7.2672 LearningRate 0.0578 Epoch: 11 Global Step: 58460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:59:04,365-Speed 3433.69 samples/sec Loss 7.1581 LearningRate 0.0578 Epoch: 11 Global Step: 58470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:59:07,342-Speed 3440.45 samples/sec Loss 7.0993 LearningRate 0.0578 Epoch: 11 Global Step: 58480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:59:10,321-Speed 3438.69 samples/sec Loss 7.0806 LearningRate 0.0578 Epoch: 11 Global Step: 58490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:59:13,317-Speed 3419.58 samples/sec Loss 7.1463 LearningRate 0.0578 Epoch: 11 Global Step: 58500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:59:16,298-Speed 3435.78 samples/sec Loss 7.2388 LearningRate 0.0577 Epoch: 11 Global Step: 58510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:59:19,345-Speed 3361.83 samples/sec Loss 7.2964 LearningRate 0.0577 Epoch: 11 Global Step: 58520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:59:22,418-Speed 3332.95 samples/sec Loss 7.1126 LearningRate 0.0577 Epoch: 11 Global Step: 58530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:59:25,387-Speed 3449.95 samples/sec Loss 7.2066 LearningRate 0.0577 Epoch: 11 Global Step: 58540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:59:28,364-Speed 3440.45 samples/sec Loss 7.1340 LearningRate 0.0577 Epoch: 11 Global Step: 58550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:59:31,339-Speed 3444.41 samples/sec Loss 7.3368 LearningRate 0.0577 Epoch: 11 Global Step: 58560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:59:34,398-Speed 3347.80 samples/sec Loss 7.2256 LearningRate 0.0576 Epoch: 11 Global Step: 58570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:59:37,442-Speed 3374.55 samples/sec Loss 7.4144 LearningRate 0.0576 Epoch: 11 Global Step: 58580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:59:40,421-Speed 3437.50 samples/sec Loss 7.2051 LearningRate 0.0576 Epoch: 11 Global Step: 58590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:59:43,527-Speed 3298.57 samples/sec Loss 6.9446 LearningRate 0.0576 Epoch: 11 Global Step: 58600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:59:46,580-Speed 3354.84 samples/sec Loss 7.2275 LearningRate 0.0576 Epoch: 11 Global Step: 58610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:59:49,638-Speed 3349.24 samples/sec Loss 7.0523 LearningRate 0.0575 Epoch: 11 Global Step: 58620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:59:52,664-Speed 3384.68 samples/sec Loss 7.1862 LearningRate 0.0575 Epoch: 11 Global Step: 58630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 22:59:55,671-Speed 3408.09 samples/sec Loss 7.2123 LearningRate 0.0575 Epoch: 11 Global Step: 58640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 22:59:58,670-Speed 3415.47 samples/sec Loss 7.3261 LearningRate 0.0575 Epoch: 11 Global Step: 58650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:01,651-Speed 3436.30 samples/sec Loss 7.3202 LearningRate 0.0575 Epoch: 11 Global Step: 58660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:04,668-Speed 3395.50 samples/sec Loss 7.1569 LearningRate 0.0575 Epoch: 11 Global Step: 58670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:07,667-Speed 3414.74 samples/sec Loss 7.1433 LearningRate 0.0574 Epoch: 11 Global Step: 58680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:10,648-Speed 3436.57 samples/sec Loss 7.2135 LearningRate 0.0574 Epoch: 11 Global Step: 58690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:13,637-Speed 3426.09 samples/sec Loss 7.2342 LearningRate 0.0574 Epoch: 11 Global Step: 58700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:16,620-Speed 3433.96 samples/sec Loss 7.1792 LearningRate 0.0574 Epoch: 11 Global Step: 58710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:19,611-Speed 3424.91 samples/sec Loss 7.3629 LearningRate 0.0574 Epoch: 11 Global Step: 58720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:22,610-Speed 3415.31 samples/sec Loss 7.0709 LearningRate 0.0574 Epoch: 11 Global Step: 58730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:25,591-Speed 3436.13 samples/sec Loss 7.1012 LearningRate 0.0573 Epoch: 11 Global Step: 58740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:00:28,578-Speed 3429.17 samples/sec Loss 6.9993 LearningRate 0.0573 Epoch: 11 Global Step: 58750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:00:31,541-Speed 3456.07 samples/sec Loss 7.1371 LearningRate 0.0573 Epoch: 11 Global Step: 58760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:34,527-Speed 3430.78 samples/sec Loss 7.1575 LearningRate 0.0573 Epoch: 11 Global Step: 58770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:37,509-Speed 3435.31 samples/sec Loss 7.0652 LearningRate 0.0573 Epoch: 11 Global Step: 58780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:40,494-Speed 3431.21 samples/sec Loss 7.1275 LearningRate 0.0573 Epoch: 11 Global Step: 58790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:43,477-Speed 3433.91 samples/sec Loss 7.1451 LearningRate 0.0572 Epoch: 11 Global Step: 58800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:46,456-Speed 3438.06 samples/sec Loss 7.0429 LearningRate 0.0572 Epoch: 11 Global Step: 58810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:49,489-Speed 3376.77 samples/sec Loss 7.2372 LearningRate 0.0572 Epoch: 11 Global Step: 58820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:52,575-Speed 3319.22 samples/sec Loss 7.1146 LearningRate 0.0572 Epoch: 11 Global Step: 58830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:55,660-Speed 3319.86 samples/sec Loss 7.1034 LearningRate 0.0572 Epoch: 11 Global Step: 58840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:00:58,681-Speed 3390.23 samples/sec Loss 7.0578 LearningRate 0.0572 Epoch: 11 Global Step: 58850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:01,682-Speed 3413.66 samples/sec Loss 7.2516 LearningRate 0.0571 Epoch: 11 Global Step: 58860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:01:04,723-Speed 3368.07 samples/sec Loss 7.0724 LearningRate 0.0571 Epoch: 11 Global Step: 58870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:01:07,698-Speed 3442.84 samples/sec Loss 7.0489 LearningRate 0.0571 Epoch: 11 Global Step: 58880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:10,691-Speed 3423.33 samples/sec Loss 7.1825 LearningRate 0.0571 Epoch: 11 Global Step: 58890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:13,681-Speed 3424.91 samples/sec Loss 7.0368 LearningRate 0.0571 Epoch: 11 Global Step: 58900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:16,819-Speed 3263.84 samples/sec Loss 7.1373 LearningRate 0.0571 Epoch: 11 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:19,826-Speed 3406.28 samples/sec Loss 7.2197 LearningRate 0.0570 Epoch: 11 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:22,807-Speed 3436.31 samples/sec Loss 6.9609 LearningRate 0.0570 Epoch: 11 Global Step: 58930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:25,886-Speed 3327.49 samples/sec Loss 7.1739 LearningRate 0.0570 Epoch: 11 Global Step: 58940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:28,874-Speed 3427.23 samples/sec Loss 7.0382 LearningRate 0.0570 Epoch: 11 Global Step: 58950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:31,879-Speed 3408.56 samples/sec Loss 7.1410 LearningRate 0.0570 Epoch: 11 Global Step: 58960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:34,864-Speed 3432.30 samples/sec Loss 7.1003 LearningRate 0.0570 Epoch: 11 Global Step: 58970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:37,860-Speed 3418.47 samples/sec Loss 7.0397 LearningRate 0.0569 Epoch: 11 Global Step: 58980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:01:40,852-Speed 3423.38 samples/sec Loss 7.0449 LearningRate 0.0569 Epoch: 11 Global Step: 58990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:01:43,858-Speed 3407.56 samples/sec Loss 7.2153 LearningRate 0.0569 Epoch: 11 Global Step: 59000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:01:46,844-Speed 3429.61 samples/sec Loss 7.1187 LearningRate 0.0569 Epoch: 11 Global Step: 59010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:01:49,834-Speed 3426.74 samples/sec Loss 7.1609 LearningRate 0.0569 Epoch: 11 Global Step: 59020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:01:52,902-Speed 3337.80 samples/sec Loss 7.1903 LearningRate 0.0569 Epoch: 11 Global Step: 59030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:01:55,923-Speed 3390.46 samples/sec Loss 7.0931 LearningRate 0.0568 Epoch: 11 Global Step: 59040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:01:58,906-Speed 3434.33 samples/sec Loss 7.2072 LearningRate 0.0568 Epoch: 11 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:02:01,909-Speed 3411.03 samples/sec Loss 6.9950 LearningRate 0.0568 Epoch: 11 Global Step: 59060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:02:04,895-Speed 3430.14 samples/sec Loss 7.3733 LearningRate 0.0568 Epoch: 11 Global Step: 59070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:02:07,876-Speed 3435.77 samples/sec Loss 7.2145 LearningRate 0.0568 Epoch: 11 Global Step: 59080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:02:10,864-Speed 3428.21 samples/sec Loss 7.0048 LearningRate 0.0568 Epoch: 11 Global Step: 59090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:02:13,887-Speed 3388.39 samples/sec Loss 7.1303 LearningRate 0.0567 Epoch: 11 Global Step: 59100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:02:16,967-Speed 3324.68 samples/sec Loss 7.2006 LearningRate 0.0567 Epoch: 11 Global Step: 59110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:02:19,957-Speed 3426.82 samples/sec Loss 7.2254 LearningRate 0.0567 Epoch: 11 Global Step: 59120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:02:22,951-Speed 3420.36 samples/sec Loss 6.9666 LearningRate 0.0567 Epoch: 11 Global Step: 59130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:02:25,946-Speed 3420.52 samples/sec Loss 7.1082 LearningRate 0.0567 Epoch: 11 Global Step: 59140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:02:28,955-Speed 3403.83 samples/sec Loss 7.1093 LearningRate 0.0567 Epoch: 11 Global Step: 59150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:02:32,033-Speed 3327.87 samples/sec Loss 7.1067 LearningRate 0.0566 Epoch: 11 Global Step: 59160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:02:35,114-Speed 3324.54 samples/sec Loss 7.1000 LearningRate 0.0566 Epoch: 11 Global Step: 59170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:02:38,091-Speed 3439.87 samples/sec Loss 7.1209 LearningRate 0.0566 Epoch: 11 Global Step: 59180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:02:41,074-Speed 3434.88 samples/sec Loss 6.9646 LearningRate 0.0566 Epoch: 11 Global Step: 59190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:02:44,079-Speed 3408.25 samples/sec Loss 7.1386 LearningRate 0.0566 Epoch: 11 Global Step: 59200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:02:47,072-Speed 3421.32 samples/sec Loss 6.9955 LearningRate 0.0566 Epoch: 11 Global Step: 59210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:02:50,056-Speed 3434.22 samples/sec Loss 7.1059 LearningRate 0.0565 Epoch: 11 Global Step: 59220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:02:53,102-Speed 3362.01 samples/sec Loss 7.1881 LearningRate 0.0565 Epoch: 11 Global Step: 59230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:02:56,074-Speed 3446.84 samples/sec Loss 7.0737 LearningRate 0.0565 Epoch: 11 Global Step: 59240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:02:59,044-Speed 3448.11 samples/sec Loss 7.1787 LearningRate 0.0565 Epoch: 11 Global Step: 59250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:02,025-Speed 3435.58 samples/sec Loss 7.2204 LearningRate 0.0565 Epoch: 11 Global Step: 59260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:05,017-Speed 3423.73 samples/sec Loss 7.2034 LearningRate 0.0565 Epoch: 11 Global Step: 59270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:08,162-Speed 3256.84 samples/sec Loss 7.2094 LearningRate 0.0564 Epoch: 11 Global Step: 59280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:11,188-Speed 3385.19 samples/sec Loss 7.0007 LearningRate 0.0564 Epoch: 11 Global Step: 59290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:14,338-Speed 3251.60 samples/sec Loss 7.2278 LearningRate 0.0564 Epoch: 11 Global Step: 59300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:17,338-Speed 3413.90 samples/sec Loss 7.1772 LearningRate 0.0564 Epoch: 11 Global Step: 59310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:20,318-Speed 3437.37 samples/sec Loss 7.2178 LearningRate 0.0564 Epoch: 11 Global Step: 59320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:23,324-Speed 3407.64 samples/sec Loss 7.3134 LearningRate 0.0564 Epoch: 11 Global Step: 59330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:26,308-Speed 3432.83 samples/sec Loss 7.1294 LearningRate 0.0563 Epoch: 11 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:29,310-Speed 3411.69 samples/sec Loss 7.0812 LearningRate 0.0563 Epoch: 11 Global Step: 59350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:03:32,297-Speed 3428.81 samples/sec Loss 7.2404 LearningRate 0.0563 Epoch: 11 Global Step: 59360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:03:35,303-Speed 3408.60 samples/sec Loss 7.2924 LearningRate 0.0563 Epoch: 11 Global Step: 59370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:03:38,306-Speed 3410.45 samples/sec Loss 7.1033 LearningRate 0.0563 Epoch: 11 Global Step: 59380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:03:41,285-Speed 3438.22 samples/sec Loss 7.1859 LearningRate 0.0562 Epoch: 11 Global Step: 59390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:03:44,253-Speed 3450.38 samples/sec Loss 6.9861 LearningRate 0.0562 Epoch: 11 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:47,242-Speed 3426.78 samples/sec Loss 6.9931 LearningRate 0.0562 Epoch: 11 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:50,220-Speed 3439.85 samples/sec Loss 7.0205 LearningRate 0.0562 Epoch: 11 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:53,204-Speed 3432.32 samples/sec Loss 7.0829 LearningRate 0.0562 Epoch: 11 Global Step: 59430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:56,209-Speed 3409.42 samples/sec Loss 7.0712 LearningRate 0.0562 Epoch: 11 Global Step: 59440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:03:59,200-Speed 3424.62 samples/sec Loss 7.0763 LearningRate 0.0561 Epoch: 11 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:04:02,180-Speed 3436.37 samples/sec Loss 7.3172 LearningRate 0.0561 Epoch: 11 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:04:05,168-Speed 3428.97 samples/sec Loss 6.9604 LearningRate 0.0561 Epoch: 11 Global Step: 59470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:04:08,169-Speed 3412.11 samples/sec Loss 7.2233 LearningRate 0.0561 Epoch: 11 Global Step: 59480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:04:11,151-Speed 3435.16 samples/sec Loss 6.8904 LearningRate 0.0561 Epoch: 11 Global Step: 59490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:04:14,147-Speed 3419.08 samples/sec Loss 7.0399 LearningRate 0.0561 Epoch: 11 Global Step: 59500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:04:17,140-Speed 3421.95 samples/sec Loss 6.9471 LearningRate 0.0560 Epoch: 11 Global Step: 59510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:04:20,131-Speed 3424.47 samples/sec Loss 7.1289 LearningRate 0.0560 Epoch: 11 Global Step: 59520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:04:23,135-Speed 3410.02 samples/sec Loss 7.0254 LearningRate 0.0560 Epoch: 11 Global Step: 59530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:04:26,119-Speed 3432.70 samples/sec Loss 7.0476 LearningRate 0.0560 Epoch: 11 Global Step: 59540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:04:29,106-Speed 3428.62 samples/sec Loss 7.1273 LearningRate 0.0560 Epoch: 11 Global Step: 59550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:04:32,123-Speed 3395.55 samples/sec Loss 7.1629 LearningRate 0.0560 Epoch: 11 Global Step: 59560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:04:35,123-Speed 3413.94 samples/sec Loss 7.1432 LearningRate 0.0559 Epoch: 11 Global Step: 59570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:04:38,109-Speed 3429.95 samples/sec Loss 7.1716 LearningRate 0.0559 Epoch: 11 Global Step: 59580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:04:41,085-Speed 3442.62 samples/sec Loss 7.0162 LearningRate 0.0559 Epoch: 11 Global Step: 59590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:04:44,077-Speed 3423.76 samples/sec Loss 7.1386 LearningRate 0.0559 Epoch: 11 Global Step: 59600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:04:47,068-Speed 3423.87 samples/sec Loss 7.0428 LearningRate 0.0559 Epoch: 11 Global Step: 59610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:04:50,113-Speed 3368.10 samples/sec Loss 6.9091 LearningRate 0.0559 Epoch: 11 Global Step: 59620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:04:53,173-Speed 3346.63 samples/sec Loss 7.0077 LearningRate 0.0558 Epoch: 11 Global Step: 59630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:04:56,177-Speed 3410.88 samples/sec Loss 7.1269 LearningRate 0.0558 Epoch: 11 Global Step: 59640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:04:59,193-Speed 3395.69 samples/sec Loss 7.1128 LearningRate 0.0558 Epoch: 11 Global Step: 59650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:05:02,193-Speed 3414.81 samples/sec Loss 7.0303 LearningRate 0.0558 Epoch: 11 Global Step: 59660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:05:05,180-Speed 3428.54 samples/sec Loss 7.2753 LearningRate 0.0558 Epoch: 11 Global Step: 59670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:05:08,166-Speed 3430.26 samples/sec Loss 6.9668 LearningRate 0.0558 Epoch: 11 Global Step: 59680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:05:11,154-Speed 3428.37 samples/sec Loss 6.9956 LearningRate 0.0557 Epoch: 11 Global Step: 59690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:05:14,140-Speed 3430.06 samples/sec Loss 7.1930 LearningRate 0.0557 Epoch: 11 Global Step: 59700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:05:17,206-Speed 3341.02 samples/sec Loss 6.9950 LearningRate 0.0557 Epoch: 11 Global Step: 59710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:05:20,227-Speed 3390.56 samples/sec Loss 6.8647 LearningRate 0.0557 Epoch: 11 Global Step: 59720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:05:23,273-Speed 3362.81 samples/sec Loss 7.0854 LearningRate 0.0557 Epoch: 11 Global Step: 59730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:05:26,253-Speed 3437.03 samples/sec Loss 7.0410 LearningRate 0.0557 Epoch: 11 Global Step: 59740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:05:29,245-Speed 3423.43 samples/sec Loss 7.0983 LearningRate 0.0556 Epoch: 11 Global Step: 59750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:05:32,247-Speed 3411.23 samples/sec Loss 6.9908 LearningRate 0.0556 Epoch: 11 Global Step: 59760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:05:35,393-Speed 3256.18 samples/sec Loss 6.9591 LearningRate 0.0556 Epoch: 11 Global Step: 59770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:05:38,393-Speed 3414.85 samples/sec Loss 6.9132 LearningRate 0.0556 Epoch: 11 Global Step: 59780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:05:41,379-Speed 3430.50 samples/sec Loss 7.1695 LearningRate 0.0556 Epoch: 11 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:05:44,374-Speed 3419.61 samples/sec Loss 7.0804 LearningRate 0.0556 Epoch: 11 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:05:47,358-Speed 3431.34 samples/sec Loss 7.0761 LearningRate 0.0555 Epoch: 11 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:05:50,358-Speed 3415.11 samples/sec Loss 7.0185 LearningRate 0.0555 Epoch: 11 Global Step: 59820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:05:53,364-Speed 3407.28 samples/sec Loss 7.0159 LearningRate 0.0555 Epoch: 11 Global Step: 59830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:05:56,382-Speed 3393.96 samples/sec Loss 7.1005 LearningRate 0.0555 Epoch: 11 Global Step: 59840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:05:59,378-Speed 3418.91 samples/sec Loss 6.9961 LearningRate 0.0555 Epoch: 11 Global Step: 59850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:06:02,349-Speed 3446.82 samples/sec Loss 6.9323 LearningRate 0.0555 Epoch: 11 Global Step: 59860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:05,344-Speed 3421.16 samples/sec Loss 7.0062 LearningRate 0.0554 Epoch: 11 Global Step: 59870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:08,326-Speed 3434.27 samples/sec Loss 7.0159 LearningRate 0.0554 Epoch: 11 Global Step: 59880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:11,419-Speed 3311.45 samples/sec Loss 7.0255 LearningRate 0.0554 Epoch: 11 Global Step: 59890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:14,417-Speed 3416.18 samples/sec Loss 7.1811 LearningRate 0.0554 Epoch: 11 Global Step: 59900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:17,433-Speed 3396.32 samples/sec Loss 7.1193 LearningRate 0.0554 Epoch: 11 Global Step: 59910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:20,420-Speed 3429.06 samples/sec Loss 7.0968 LearningRate 0.0554 Epoch: 11 Global Step: 59920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:23,403-Speed 3434.14 samples/sec Loss 6.9232 LearningRate 0.0553 Epoch: 11 Global Step: 59930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:26,386-Speed 3433.46 samples/sec Loss 7.0658 LearningRate 0.0553 Epoch: 11 Global Step: 59940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:29,384-Speed 3417.24 samples/sec Loss 7.1063 LearningRate 0.0553 Epoch: 11 Global Step: 59950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:32,369-Speed 3431.80 samples/sec Loss 7.1891 LearningRate 0.0553 Epoch: 11 Global Step: 59960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:06:35,354-Speed 3430.89 samples/sec Loss 7.1231 LearningRate 0.0553 Epoch: 11 Global Step: 59970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:06:38,325-Speed 3447.83 samples/sec Loss 7.0927 LearningRate 0.0553 Epoch: 11 Global Step: 59980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:41,345-Speed 3391.28 samples/sec Loss 7.2061 LearningRate 0.0552 Epoch: 11 Global Step: 59990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:06:44,334-Speed 3426.43 samples/sec Loss 6.9814 LearningRate 0.0552 Epoch: 11 Global Step: 60000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:07:27,251-[lfw][60000]XNorm: 22.706838 Training: 2022-01-19 23:07:27,252-[lfw][60000]Accuracy-Flip: 0.99650+-0.00353 Training: 2022-01-19 23:07:27,252-[lfw][60000]Accuracy-Highest: 0.99767 Training: 2022-01-19 23:08:17,253-[cfp_fp][60000]XNorm: 20.035312 Training: 2022-01-19 23:08:17,254-[cfp_fp][60000]Accuracy-Flip: 0.96614+-0.00874 Training: 2022-01-19 23:08:17,254-[cfp_fp][60000]Accuracy-Highest: 0.96829 Training: 2022-01-19 23:09:00,121-[agedb_30][60000]XNorm: 22.407023 Training: 2022-01-19 23:09:00,122-[agedb_30][60000]Accuracy-Flip: 0.97533+-0.00839 Training: 2022-01-19 23:09:00,122-[agedb_30][60000]Accuracy-Highest: 0.97667 Training: 2022-01-19 23:09:03,100-Speed 73.79 samples/sec Loss 7.0341 LearningRate 0.0552 Epoch: 11 Global Step: 60010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:06,071-Speed 3448.31 samples/sec Loss 7.1938 LearningRate 0.0552 Epoch: 11 Global Step: 60020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:09,048-Speed 3439.74 samples/sec Loss 7.0562 LearningRate 0.0552 Epoch: 11 Global Step: 60030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:12,023-Speed 3443.36 samples/sec Loss 7.0435 LearningRate 0.0552 Epoch: 11 Global Step: 60040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:15,004-Speed 3436.18 samples/sec Loss 7.0955 LearningRate 0.0551 Epoch: 11 Global Step: 60050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:17,983-Speed 3438.59 samples/sec Loss 7.0420 LearningRate 0.0551 Epoch: 11 Global Step: 60060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:20,969-Speed 3429.91 samples/sec Loss 6.9625 LearningRate 0.0551 Epoch: 11 Global Step: 60070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:23,962-Speed 3421.83 samples/sec Loss 7.1198 LearningRate 0.0551 Epoch: 11 Global Step: 60080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:09:26,940-Speed 3439.72 samples/sec Loss 6.9153 LearningRate 0.0551 Epoch: 11 Global Step: 60090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:09:29,938-Speed 3416.68 samples/sec Loss 6.9135 LearningRate 0.0551 Epoch: 11 Global Step: 60100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:32,933-Speed 3420.22 samples/sec Loss 7.3870 LearningRate 0.0550 Epoch: 11 Global Step: 60110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:35,917-Speed 3432.04 samples/sec Loss 7.1048 LearningRate 0.0550 Epoch: 11 Global Step: 60120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:38,897-Speed 3437.38 samples/sec Loss 7.1583 LearningRate 0.0550 Epoch: 11 Global Step: 60130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:41,880-Speed 3434.03 samples/sec Loss 7.0800 LearningRate 0.0550 Epoch: 11 Global Step: 60140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:44,862-Speed 3434.42 samples/sec Loss 7.0646 LearningRate 0.0550 Epoch: 11 Global Step: 60150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:47,858-Speed 3418.81 samples/sec Loss 7.1468 LearningRate 0.0550 Epoch: 11 Global Step: 60160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:50,850-Speed 3423.45 samples/sec Loss 6.8895 LearningRate 0.0549 Epoch: 11 Global Step: 60170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:53,856-Speed 3407.37 samples/sec Loss 7.1522 LearningRate 0.0549 Epoch: 11 Global Step: 60180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:56,872-Speed 3395.63 samples/sec Loss 6.9888 LearningRate 0.0549 Epoch: 11 Global Step: 60190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:09:59,916-Speed 3364.75 samples/sec Loss 6.8119 LearningRate 0.0549 Epoch: 11 Global Step: 60200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:10:02,910-Speed 3422.33 samples/sec Loss 6.9306 LearningRate 0.0549 Epoch: 11 Global Step: 60210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:10:05,895-Speed 3430.55 samples/sec Loss 6.9972 LearningRate 0.0549 Epoch: 11 Global Step: 60220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:10:08,916-Speed 3390.55 samples/sec Loss 7.0507 LearningRate 0.0548 Epoch: 11 Global Step: 60230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:10:11,954-Speed 3372.56 samples/sec Loss 7.0464 LearningRate 0.0548 Epoch: 11 Global Step: 60240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:10:14,980-Speed 3384.61 samples/sec Loss 6.9945 LearningRate 0.0548 Epoch: 11 Global Step: 60250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:10:18,085-Speed 3299.05 samples/sec Loss 6.8550 LearningRate 0.0548 Epoch: 11 Global Step: 60260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:10:21,074-Speed 3425.93 samples/sec Loss 7.0219 LearningRate 0.0548 Epoch: 11 Global Step: 60270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:10:24,065-Speed 3424.29 samples/sec Loss 6.9471 LearningRate 0.0548 Epoch: 11 Global Step: 60280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:10:27,065-Speed 3415.28 samples/sec Loss 7.1416 LearningRate 0.0547 Epoch: 11 Global Step: 60290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:10:30,048-Speed 3433.21 samples/sec Loss 7.0126 LearningRate 0.0547 Epoch: 11 Global Step: 60300 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-19 23:10:33,038-Speed 3426.77 samples/sec Loss 6.9713 LearningRate 0.0547 Epoch: 11 Global Step: 60310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:10:36,076-Speed 3371.19 samples/sec Loss 7.0669 LearningRate 0.0547 Epoch: 11 Global Step: 60320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:10:39,217-Speed 3260.39 samples/sec Loss 7.1599 LearningRate 0.0547 Epoch: 11 Global Step: 60330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:10:42,262-Speed 3364.66 samples/sec Loss 7.0304 LearningRate 0.0547 Epoch: 11 Global Step: 60340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:10:45,247-Speed 3431.13 samples/sec Loss 7.0048 LearningRate 0.0547 Epoch: 11 Global Step: 60350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:10:48,245-Speed 3416.65 samples/sec Loss 6.9814 LearningRate 0.0546 Epoch: 11 Global Step: 60360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:10:51,227-Speed 3434.33 samples/sec Loss 7.1094 LearningRate 0.0546 Epoch: 11 Global Step: 60370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:10:54,208-Speed 3435.91 samples/sec Loss 7.1036 LearningRate 0.0546 Epoch: 11 Global Step: 60380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:10:57,206-Speed 3430.44 samples/sec Loss 6.9789 LearningRate 0.0546 Epoch: 11 Global Step: 60390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:00,186-Speed 3437.16 samples/sec Loss 6.9315 LearningRate 0.0546 Epoch: 11 Global Step: 60400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:03,228-Speed 3367.45 samples/sec Loss 7.1526 LearningRate 0.0546 Epoch: 11 Global Step: 60410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:06,198-Speed 3447.72 samples/sec Loss 6.9947 LearningRate 0.0545 Epoch: 11 Global Step: 60420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:09,185-Speed 3429.46 samples/sec Loss 7.1775 LearningRate 0.0545 Epoch: 11 Global Step: 60430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:12,158-Speed 3445.96 samples/sec Loss 7.0046 LearningRate 0.0545 Epoch: 11 Global Step: 60440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:15,155-Speed 3417.51 samples/sec Loss 6.9991 LearningRate 0.0545 Epoch: 11 Global Step: 60450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:18,136-Speed 3436.41 samples/sec Loss 7.0907 LearningRate 0.0545 Epoch: 11 Global Step: 60460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:21,114-Speed 3438.62 samples/sec Loss 7.0219 LearningRate 0.0545 Epoch: 11 Global Step: 60470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:24,096-Speed 3435.44 samples/sec Loss 6.9901 LearningRate 0.0544 Epoch: 11 Global Step: 60480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:27,092-Speed 3418.80 samples/sec Loss 6.7888 LearningRate 0.0544 Epoch: 11 Global Step: 60490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:30,083-Speed 3425.04 samples/sec Loss 7.0328 LearningRate 0.0544 Epoch: 11 Global Step: 60500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:33,061-Speed 3439.71 samples/sec Loss 6.9257 LearningRate 0.0544 Epoch: 11 Global Step: 60510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:36,038-Speed 3440.55 samples/sec Loss 7.0399 LearningRate 0.0544 Epoch: 11 Global Step: 60520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:11:39,021-Speed 3432.87 samples/sec Loss 6.8556 LearningRate 0.0544 Epoch: 11 Global Step: 60530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:11:42,003-Speed 3435.93 samples/sec Loss 6.9050 LearningRate 0.0543 Epoch: 11 Global Step: 60540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:11:44,981-Speed 3438.30 samples/sec Loss 7.0131 LearningRate 0.0543 Epoch: 11 Global Step: 60550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:11:47,973-Speed 3423.58 samples/sec Loss 7.0120 LearningRate 0.0543 Epoch: 11 Global Step: 60560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:11:50,962-Speed 3427.40 samples/sec Loss 7.0086 LearningRate 0.0543 Epoch: 11 Global Step: 60570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:11:53,964-Speed 3411.94 samples/sec Loss 6.9974 LearningRate 0.0543 Epoch: 11 Global Step: 60580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:11:56,957-Speed 3422.85 samples/sec Loss 7.1359 LearningRate 0.0543 Epoch: 11 Global Step: 60590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:11:59,987-Speed 3379.67 samples/sec Loss 6.9329 LearningRate 0.0542 Epoch: 11 Global Step: 60600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:02,973-Speed 3430.31 samples/sec Loss 7.0190 LearningRate 0.0542 Epoch: 11 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:05,990-Speed 3394.89 samples/sec Loss 6.9441 LearningRate 0.0542 Epoch: 11 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:09,003-Speed 3399.70 samples/sec Loss 7.0302 LearningRate 0.0542 Epoch: 11 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:12,008-Speed 3408.23 samples/sec Loss 7.0824 LearningRate 0.0542 Epoch: 11 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:14,995-Speed 3428.50 samples/sec Loss 7.0002 LearningRate 0.0542 Epoch: 11 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:17,978-Speed 3435.15 samples/sec Loss 7.1830 LearningRate 0.0541 Epoch: 11 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:20,997-Speed 3392.28 samples/sec Loss 6.9262 LearningRate 0.0541 Epoch: 11 Global Step: 60670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:23,986-Speed 3426.82 samples/sec Loss 6.9994 LearningRate 0.0541 Epoch: 11 Global Step: 60680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:27,102-Speed 3288.23 samples/sec Loss 7.0758 LearningRate 0.0541 Epoch: 11 Global Step: 60690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:12:41,032-Speed 735.15 samples/sec Loss 6.7237 LearningRate 0.0541 Epoch: 12 Global Step: 60700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:44,068-Speed 3425.12 samples/sec Loss 6.0468 LearningRate 0.0541 Epoch: 12 Global Step: 60710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:47,219-Speed 3250.43 samples/sec Loss 6.1557 LearningRate 0.0540 Epoch: 12 Global Step: 60720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:50,231-Speed 3434.23 samples/sec Loss 6.1597 LearningRate 0.0540 Epoch: 12 Global Step: 60730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:53,262-Speed 3379.33 samples/sec Loss 6.2016 LearningRate 0.0540 Epoch: 12 Global Step: 60740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:56,277-Speed 3396.98 samples/sec Loss 6.2217 LearningRate 0.0540 Epoch: 12 Global Step: 60750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:12:59,251-Speed 3444.17 samples/sec Loss 6.1486 LearningRate 0.0540 Epoch: 12 Global Step: 60760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:13:02,291-Speed 3429.53 samples/sec Loss 6.3338 LearningRate 0.0540 Epoch: 12 Global Step: 60770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:13:05,269-Speed 3438.30 samples/sec Loss 6.2674 LearningRate 0.0539 Epoch: 12 Global Step: 60780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:13:08,607-Speed 3405.32 samples/sec Loss 6.2182 LearningRate 0.0539 Epoch: 12 Global Step: 60790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:13:11,612-Speed 3409.75 samples/sec Loss 6.4384 LearningRate 0.0539 Epoch: 12 Global Step: 60800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:13:14,594-Speed 3434.89 samples/sec Loss 6.2824 LearningRate 0.0539 Epoch: 12 Global Step: 60810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:13:17,578-Speed 3432.59 samples/sec Loss 6.2955 LearningRate 0.0539 Epoch: 12 Global Step: 60820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:13:20,570-Speed 3423.13 samples/sec Loss 6.2855 LearningRate 0.0539 Epoch: 12 Global Step: 60830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:13:23,570-Speed 3413.83 samples/sec Loss 6.3455 LearningRate 0.0538 Epoch: 12 Global Step: 60840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:13:26,623-Speed 3355.86 samples/sec Loss 6.5035 LearningRate 0.0538 Epoch: 12 Global Step: 60850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:13:29,602-Speed 3438.03 samples/sec Loss 6.4117 LearningRate 0.0538 Epoch: 12 Global Step: 60860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:13:32,604-Speed 3412.84 samples/sec Loss 6.2961 LearningRate 0.0538 Epoch: 12 Global Step: 60870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:13:35,597-Speed 3420.90 samples/sec Loss 6.3522 LearningRate 0.0538 Epoch: 12 Global Step: 60880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:13:38,613-Speed 3396.51 samples/sec Loss 6.2853 LearningRate 0.0538 Epoch: 12 Global Step: 60890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:13:41,666-Speed 3355.04 samples/sec Loss 6.4208 LearningRate 0.0537 Epoch: 12 Global Step: 60900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:13:44,846-Speed 3221.68 samples/sec Loss 6.5801 LearningRate 0.0537 Epoch: 12 Global Step: 60910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:13:47,921-Speed 3331.14 samples/sec Loss 6.3063 LearningRate 0.0537 Epoch: 12 Global Step: 60920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:13:51,005-Speed 3320.48 samples/sec Loss 6.4409 LearningRate 0.0537 Epoch: 12 Global Step: 60930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:13:53,990-Speed 3431.46 samples/sec Loss 6.3902 LearningRate 0.0537 Epoch: 12 Global Step: 60940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:13:57,011-Speed 3390.54 samples/sec Loss 6.4865 LearningRate 0.0537 Epoch: 12 Global Step: 60950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:00,101-Speed 3314.80 samples/sec Loss 6.3349 LearningRate 0.0536 Epoch: 12 Global Step: 60960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:14:03,111-Speed 3403.74 samples/sec Loss 6.4505 LearningRate 0.0536 Epoch: 12 Global Step: 60970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:14:06,097-Speed 3429.26 samples/sec Loss 6.5613 LearningRate 0.0536 Epoch: 12 Global Step: 60980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:09,161-Speed 3343.23 samples/sec Loss 6.4432 LearningRate 0.0536 Epoch: 12 Global Step: 60990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:12,208-Speed 3362.35 samples/sec Loss 6.3947 LearningRate 0.0536 Epoch: 12 Global Step: 61000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:15,212-Speed 3409.08 samples/sec Loss 6.5585 LearningRate 0.0536 Epoch: 12 Global Step: 61010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:18,221-Speed 3404.49 samples/sec Loss 6.5536 LearningRate 0.0535 Epoch: 12 Global Step: 61020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:21,226-Speed 3407.81 samples/sec Loss 6.5604 LearningRate 0.0535 Epoch: 12 Global Step: 61030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:24,226-Speed 3413.91 samples/sec Loss 6.5186 LearningRate 0.0535 Epoch: 12 Global Step: 61040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:27,237-Speed 3402.76 samples/sec Loss 6.4701 LearningRate 0.0535 Epoch: 12 Global Step: 61050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:30,243-Speed 3407.29 samples/sec Loss 6.3535 LearningRate 0.0535 Epoch: 12 Global Step: 61060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:33,308-Speed 3342.58 samples/sec Loss 6.6423 LearningRate 0.0535 Epoch: 12 Global Step: 61070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:36,357-Speed 3358.66 samples/sec Loss 6.4882 LearningRate 0.0535 Epoch: 12 Global Step: 61080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:14:39,485-Speed 3274.53 samples/sec Loss 6.6369 LearningRate 0.0534 Epoch: 12 Global Step: 61090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:14:42,503-Speed 3394.57 samples/sec Loss 6.6107 LearningRate 0.0534 Epoch: 12 Global Step: 61100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:14:45,505-Speed 3411.04 samples/sec Loss 6.4151 LearningRate 0.0534 Epoch: 12 Global Step: 61110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:48,544-Speed 3370.84 samples/sec Loss 6.4839 LearningRate 0.0534 Epoch: 12 Global Step: 61120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:51,555-Speed 3401.89 samples/sec Loss 6.6749 LearningRate 0.0534 Epoch: 12 Global Step: 61130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:54,610-Speed 3352.97 samples/sec Loss 6.5321 LearningRate 0.0534 Epoch: 12 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:14:57,599-Speed 3427.52 samples/sec Loss 6.5329 LearningRate 0.0533 Epoch: 12 Global Step: 61150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:15:00,580-Speed 3436.39 samples/sec Loss 6.6474 LearningRate 0.0533 Epoch: 12 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:15:03,572-Speed 3422.67 samples/sec Loss 6.6439 LearningRate 0.0533 Epoch: 12 Global Step: 61170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:15:06,580-Speed 3405.16 samples/sec Loss 6.5958 LearningRate 0.0533 Epoch: 12 Global Step: 61180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:15:09,566-Speed 3430.17 samples/sec Loss 6.6968 LearningRate 0.0533 Epoch: 12 Global Step: 61190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:15:12,548-Speed 3435.06 samples/sec Loss 6.7174 LearningRate 0.0533 Epoch: 12 Global Step: 61200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:15:15,549-Speed 3412.50 samples/sec Loss 6.4384 LearningRate 0.0532 Epoch: 12 Global Step: 61210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:15:18,527-Speed 3440.09 samples/sec Loss 6.7171 LearningRate 0.0532 Epoch: 12 Global Step: 61220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:15:21,495-Speed 3451.24 samples/sec Loss 6.5987 LearningRate 0.0532 Epoch: 12 Global Step: 61230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:15:24,482-Speed 3429.68 samples/sec Loss 6.4237 LearningRate 0.0532 Epoch: 12 Global Step: 61240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:15:27,480-Speed 3415.75 samples/sec Loss 6.6548 LearningRate 0.0532 Epoch: 12 Global Step: 61250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:15:30,485-Speed 3408.98 samples/sec Loss 6.4541 LearningRate 0.0532 Epoch: 12 Global Step: 61260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:15:33,465-Speed 3436.93 samples/sec Loss 6.6056 LearningRate 0.0531 Epoch: 12 Global Step: 61270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:15:36,512-Speed 3361.26 samples/sec Loss 6.6964 LearningRate 0.0531 Epoch: 12 Global Step: 61280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:15:39,566-Speed 3354.59 samples/sec Loss 6.4322 LearningRate 0.0531 Epoch: 12 Global Step: 61290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:15:42,585-Speed 3392.29 samples/sec Loss 6.6810 LearningRate 0.0531 Epoch: 12 Global Step: 61300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:15:45,571-Speed 3430.73 samples/sec Loss 6.4053 LearningRate 0.0531 Epoch: 12 Global Step: 61310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:15:48,556-Speed 3431.60 samples/sec Loss 6.6951 LearningRate 0.0531 Epoch: 12 Global Step: 61320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:15:51,544-Speed 3427.72 samples/sec Loss 6.5018 LearningRate 0.0530 Epoch: 12 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:15:54,527-Speed 3433.85 samples/sec Loss 6.6978 LearningRate 0.0530 Epoch: 12 Global Step: 61340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:15:57,506-Speed 3437.67 samples/sec Loss 6.5469 LearningRate 0.0530 Epoch: 12 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:16:00,487-Speed 3435.72 samples/sec Loss 6.7354 LearningRate 0.0530 Epoch: 12 Global Step: 61360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:16:03,493-Speed 3407.97 samples/sec Loss 6.6890 LearningRate 0.0530 Epoch: 12 Global Step: 61370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:16:06,531-Speed 3372.04 samples/sec Loss 6.6328 LearningRate 0.0530 Epoch: 12 Global Step: 61380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:16:09,528-Speed 3416.64 samples/sec Loss 6.7512 LearningRate 0.0529 Epoch: 12 Global Step: 61390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:16:12,509-Speed 3436.95 samples/sec Loss 6.8688 LearningRate 0.0529 Epoch: 12 Global Step: 61400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:16:15,491-Speed 3434.60 samples/sec Loss 6.7346 LearningRate 0.0529 Epoch: 12 Global Step: 61410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:16:18,489-Speed 3417.27 samples/sec Loss 6.4594 LearningRate 0.0529 Epoch: 12 Global Step: 61420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:16:21,493-Speed 3408.87 samples/sec Loss 6.5650 LearningRate 0.0529 Epoch: 12 Global Step: 61430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:24,497-Speed 3409.60 samples/sec Loss 6.8334 LearningRate 0.0529 Epoch: 12 Global Step: 61440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:27,542-Speed 3364.30 samples/sec Loss 6.7187 LearningRate 0.0528 Epoch: 12 Global Step: 61450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:30,526-Speed 3431.68 samples/sec Loss 6.7803 LearningRate 0.0528 Epoch: 12 Global Step: 61460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:33,513-Speed 3430.87 samples/sec Loss 6.6789 LearningRate 0.0528 Epoch: 12 Global Step: 61470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:36,495-Speed 3434.19 samples/sec Loss 6.5651 LearningRate 0.0528 Epoch: 12 Global Step: 61480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:39,558-Speed 3343.62 samples/sec Loss 6.6743 LearningRate 0.0528 Epoch: 12 Global Step: 61490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:42,550-Speed 3423.72 samples/sec Loss 6.6243 LearningRate 0.0528 Epoch: 12 Global Step: 61500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:45,541-Speed 3424.92 samples/sec Loss 6.8663 LearningRate 0.0527 Epoch: 12 Global Step: 61510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:48,566-Speed 3386.15 samples/sec Loss 6.7408 LearningRate 0.0527 Epoch: 12 Global Step: 61520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:51,525-Speed 3461.41 samples/sec Loss 6.6898 LearningRate 0.0527 Epoch: 12 Global Step: 61530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:54,520-Speed 3419.91 samples/sec Loss 6.7856 LearningRate 0.0527 Epoch: 12 Global Step: 61540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:16:57,488-Speed 3451.18 samples/sec Loss 6.7853 LearningRate 0.0527 Epoch: 12 Global Step: 61550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:17:00,573-Speed 3320.00 samples/sec Loss 6.6073 LearningRate 0.0527 Epoch: 12 Global Step: 61560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:17:03,589-Speed 3396.34 samples/sec Loss 6.6772 LearningRate 0.0527 Epoch: 12 Global Step: 61570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:17:06,593-Speed 3409.66 samples/sec Loss 6.6729 LearningRate 0.0526 Epoch: 12 Global Step: 61580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:17:09,577-Speed 3432.70 samples/sec Loss 6.7821 LearningRate 0.0526 Epoch: 12 Global Step: 61590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:17:12,576-Speed 3416.20 samples/sec Loss 6.8293 LearningRate 0.0526 Epoch: 12 Global Step: 61600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:17:15,546-Speed 3447.81 samples/sec Loss 6.7211 LearningRate 0.0526 Epoch: 12 Global Step: 61610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:17:18,537-Speed 3424.91 samples/sec Loss 6.6251 LearningRate 0.0526 Epoch: 12 Global Step: 61620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:17:21,548-Speed 3402.41 samples/sec Loss 6.7076 LearningRate 0.0526 Epoch: 12 Global Step: 61630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:17:24,540-Speed 3423.18 samples/sec Loss 6.5688 LearningRate 0.0525 Epoch: 12 Global Step: 61640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:17:27,519-Speed 3437.78 samples/sec Loss 6.7170 LearningRate 0.0525 Epoch: 12 Global Step: 61650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:17:30,509-Speed 3426.37 samples/sec Loss 6.8706 LearningRate 0.0525 Epoch: 12 Global Step: 61660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:17:33,520-Speed 3401.75 samples/sec Loss 6.6973 LearningRate 0.0525 Epoch: 12 Global Step: 61670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:17:36,779-Speed 3143.21 samples/sec Loss 6.9547 LearningRate 0.0525 Epoch: 12 Global Step: 61680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:17:39,771-Speed 3423.01 samples/sec Loss 6.7411 LearningRate 0.0525 Epoch: 12 Global Step: 61690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:17:42,792-Speed 3390.73 samples/sec Loss 6.8212 LearningRate 0.0524 Epoch: 12 Global Step: 61700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-19 23:17:45,794-Speed 3411.42 samples/sec Loss 6.7266 LearningRate 0.0524 Epoch: 12 Global Step: 61710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:17:48,782-Speed 3428.06 samples/sec Loss 6.6394 LearningRate 0.0524 Epoch: 12 Global Step: 61720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:17:51,790-Speed 3405.19 samples/sec Loss 6.6337 LearningRate 0.0524 Epoch: 12 Global Step: 61730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:17:54,794-Speed 3409.98 samples/sec Loss 6.8504 LearningRate 0.0524 Epoch: 12 Global Step: 61740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:17:57,793-Speed 3414.83 samples/sec Loss 6.4528 LearningRate 0.0524 Epoch: 12 Global Step: 61750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:00,774-Speed 3436.57 samples/sec Loss 6.6694 LearningRate 0.0523 Epoch: 12 Global Step: 61760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:03,804-Speed 3381.03 samples/sec Loss 6.7591 LearningRate 0.0523 Epoch: 12 Global Step: 61770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:06,792-Speed 3428.21 samples/sec Loss 6.6558 LearningRate 0.0523 Epoch: 12 Global Step: 61780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:09,773-Speed 3435.23 samples/sec Loss 6.7668 LearningRate 0.0523 Epoch: 12 Global Step: 61790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:12,757-Speed 3432.99 samples/sec Loss 6.6434 LearningRate 0.0523 Epoch: 12 Global Step: 61800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:15,746-Speed 3426.58 samples/sec Loss 6.7521 LearningRate 0.0523 Epoch: 12 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:18:18,777-Speed 3380.15 samples/sec Loss 6.8440 LearningRate 0.0522 Epoch: 12 Global Step: 61820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:18:21,759-Speed 3434.42 samples/sec Loss 6.7850 LearningRate 0.0522 Epoch: 12 Global Step: 61830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:24,800-Speed 3367.90 samples/sec Loss 6.7255 LearningRate 0.0522 Epoch: 12 Global Step: 61840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:27,806-Speed 3407.89 samples/sec Loss 6.8265 LearningRate 0.0522 Epoch: 12 Global Step: 61850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:30,787-Speed 3435.06 samples/sec Loss 6.7736 LearningRate 0.0522 Epoch: 12 Global Step: 61860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:33,777-Speed 3426.11 samples/sec Loss 6.8836 LearningRate 0.0522 Epoch: 12 Global Step: 61870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:36,797-Speed 3391.81 samples/sec Loss 6.7068 LearningRate 0.0521 Epoch: 12 Global Step: 61880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:39,877-Speed 3325.16 samples/sec Loss 6.7649 LearningRate 0.0521 Epoch: 12 Global Step: 61890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:42,938-Speed 3346.51 samples/sec Loss 6.9066 LearningRate 0.0521 Epoch: 12 Global Step: 61900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:45,962-Speed 3387.89 samples/sec Loss 6.7097 LearningRate 0.0521 Epoch: 12 Global Step: 61910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:48,992-Speed 3379.62 samples/sec Loss 6.7996 LearningRate 0.0521 Epoch: 12 Global Step: 61920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:18:51,976-Speed 3433.32 samples/sec Loss 6.8591 LearningRate 0.0521 Epoch: 12 Global Step: 61930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:18:54,969-Speed 3421.33 samples/sec Loss 6.8557 LearningRate 0.0521 Epoch: 12 Global Step: 61940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:18:57,954-Speed 3432.91 samples/sec Loss 6.7562 LearningRate 0.0520 Epoch: 12 Global Step: 61950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:19:00,963-Speed 3403.88 samples/sec Loss 6.7845 LearningRate 0.0520 Epoch: 12 Global Step: 61960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:19:03,951-Speed 3428.06 samples/sec Loss 6.8325 LearningRate 0.0520 Epoch: 12 Global Step: 61970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:19:06,921-Speed 3448.34 samples/sec Loss 6.6680 LearningRate 0.0520 Epoch: 12 Global Step: 61980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:19:09,908-Speed 3429.17 samples/sec Loss 6.7206 LearningRate 0.0520 Epoch: 12 Global Step: 61990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:19:12,991-Speed 3321.95 samples/sec Loss 6.6977 LearningRate 0.0520 Epoch: 12 Global Step: 62000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:19:55,849-[lfw][62000]XNorm: 22.742045 Training: 2022-01-19 23:19:55,849-[lfw][62000]Accuracy-Flip: 0.99717+-0.00289 Training: 2022-01-19 23:19:55,850-[lfw][62000]Accuracy-Highest: 0.99767 Training: 2022-01-19 23:20:45,653-[cfp_fp][62000]XNorm: 20.180982 Training: 2022-01-19 23:20:45,654-[cfp_fp][62000]Accuracy-Flip: 0.96700+-0.00760 Training: 2022-01-19 23:20:45,654-[cfp_fp][62000]Accuracy-Highest: 0.96829 Training: 2022-01-19 23:21:28,330-[agedb_30][62000]XNorm: 22.384993 Training: 2022-01-19 23:21:28,331-[agedb_30][62000]Accuracy-Flip: 0.97750+-0.00655 Training: 2022-01-19 23:21:28,331-[agedb_30][62000]Accuracy-Highest: 0.97750 Training: 2022-01-19 23:21:31,308-Speed 74.03 samples/sec Loss 6.8023 LearningRate 0.0519 Epoch: 12 Global Step: 62010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:21:34,315-Speed 3406.29 samples/sec Loss 6.8663 LearningRate 0.0519 Epoch: 12 Global Step: 62020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:21:37,312-Speed 3417.22 samples/sec Loss 6.7970 LearningRate 0.0519 Epoch: 12 Global Step: 62030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:21:40,287-Speed 3443.44 samples/sec Loss 6.7783 LearningRate 0.0519 Epoch: 12 Global Step: 62040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:21:43,265-Speed 3439.05 samples/sec Loss 6.6133 LearningRate 0.0519 Epoch: 12 Global Step: 62050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:21:46,247-Speed 3435.82 samples/sec Loss 6.5629 LearningRate 0.0519 Epoch: 12 Global Step: 62060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:21:49,251-Speed 3409.62 samples/sec Loss 6.7621 LearningRate 0.0518 Epoch: 12 Global Step: 62070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:21:52,240-Speed 3425.56 samples/sec Loss 6.8530 LearningRate 0.0518 Epoch: 12 Global Step: 62080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:21:55,254-Speed 3399.66 samples/sec Loss 6.6723 LearningRate 0.0518 Epoch: 12 Global Step: 62090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:21:58,229-Speed 3442.12 samples/sec Loss 6.7547 LearningRate 0.0518 Epoch: 12 Global Step: 62100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:22:01,251-Speed 3390.82 samples/sec Loss 6.6880 LearningRate 0.0518 Epoch: 12 Global Step: 62110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:22:04,289-Speed 3371.01 samples/sec Loss 6.7187 LearningRate 0.0518 Epoch: 12 Global Step: 62120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:22:07,276-Speed 3429.22 samples/sec Loss 6.7039 LearningRate 0.0517 Epoch: 12 Global Step: 62130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:22:10,251-Speed 3443.32 samples/sec Loss 6.8195 LearningRate 0.0517 Epoch: 12 Global Step: 62140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:22:13,248-Speed 3417.70 samples/sec Loss 6.6792 LearningRate 0.0517 Epoch: 12 Global Step: 62150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:22:16,241-Speed 3422.94 samples/sec Loss 6.6562 LearningRate 0.0517 Epoch: 12 Global Step: 62160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:22:19,229-Speed 3427.81 samples/sec Loss 6.7391 LearningRate 0.0517 Epoch: 12 Global Step: 62170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:22:22,209-Speed 3436.84 samples/sec Loss 6.7201 LearningRate 0.0517 Epoch: 12 Global Step: 62180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:22:25,218-Speed 3404.37 samples/sec Loss 6.7531 LearningRate 0.0517 Epoch: 12 Global Step: 62190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:22:28,263-Speed 3364.30 samples/sec Loss 6.7353 LearningRate 0.0516 Epoch: 12 Global Step: 62200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:22:31,282-Speed 3393.42 samples/sec Loss 6.8323 LearningRate 0.0516 Epoch: 12 Global Step: 62210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:22:34,282-Speed 3413.98 samples/sec Loss 6.6373 LearningRate 0.0516 Epoch: 12 Global Step: 62220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:22:37,261-Speed 3438.54 samples/sec Loss 6.5431 LearningRate 0.0516 Epoch: 12 Global Step: 62230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:22:40,354-Speed 3311.13 samples/sec Loss 6.9151 LearningRate 0.0516 Epoch: 12 Global Step: 62240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:22:43,493-Speed 3262.99 samples/sec Loss 6.7200 LearningRate 0.0516 Epoch: 12 Global Step: 62250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:22:46,489-Speed 3418.54 samples/sec Loss 6.9424 LearningRate 0.0515 Epoch: 12 Global Step: 62260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:22:49,470-Speed 3436.59 samples/sec Loss 6.5970 LearningRate 0.0515 Epoch: 12 Global Step: 62270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:22:52,460-Speed 3425.30 samples/sec Loss 6.7579 LearningRate 0.0515 Epoch: 12 Global Step: 62280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:22:55,504-Speed 3365.49 samples/sec Loss 6.7271 LearningRate 0.0515 Epoch: 12 Global Step: 62290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:22:58,630-Speed 3277.09 samples/sec Loss 6.7491 LearningRate 0.0515 Epoch: 12 Global Step: 62300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:23:01,609-Speed 3437.62 samples/sec Loss 6.7746 LearningRate 0.0515 Epoch: 12 Global Step: 62310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:23:04,587-Speed 3439.64 samples/sec Loss 6.6886 LearningRate 0.0514 Epoch: 12 Global Step: 62320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:23:07,548-Speed 3458.80 samples/sec Loss 6.7742 LearningRate 0.0514 Epoch: 12 Global Step: 62330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:23:10,525-Speed 3442.29 samples/sec Loss 6.8393 LearningRate 0.0514 Epoch: 12 Global Step: 62340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:23:13,502-Speed 3440.69 samples/sec Loss 6.7960 LearningRate 0.0514 Epoch: 12 Global Step: 62350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:23:16,508-Speed 3406.59 samples/sec Loss 6.7257 LearningRate 0.0514 Epoch: 12 Global Step: 62360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:23:19,674-Speed 3235.37 samples/sec Loss 6.8051 LearningRate 0.0514 Epoch: 12 Global Step: 62370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:23:22,827-Speed 3248.99 samples/sec Loss 6.8096 LearningRate 0.0513 Epoch: 12 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:23:25,856-Speed 3382.00 samples/sec Loss 6.7885 LearningRate 0.0513 Epoch: 12 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:23:28,974-Speed 3285.20 samples/sec Loss 6.6505 LearningRate 0.0513 Epoch: 12 Global Step: 62400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:23:32,074-Speed 3303.48 samples/sec Loss 6.6688 LearningRate 0.0513 Epoch: 12 Global Step: 62410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:23:35,053-Speed 3439.20 samples/sec Loss 6.8479 LearningRate 0.0513 Epoch: 12 Global Step: 62420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:23:38,052-Speed 3415.06 samples/sec Loss 6.7167 LearningRate 0.0513 Epoch: 12 Global Step: 62430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:23:41,155-Speed 3301.30 samples/sec Loss 6.7220 LearningRate 0.0512 Epoch: 12 Global Step: 62440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:23:44,166-Speed 3402.13 samples/sec Loss 6.7789 LearningRate 0.0512 Epoch: 12 Global Step: 62450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:23:47,198-Speed 3377.36 samples/sec Loss 6.7628 LearningRate 0.0512 Epoch: 12 Global Step: 62460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:23:50,335-Speed 3265.97 samples/sec Loss 6.7669 LearningRate 0.0512 Epoch: 12 Global Step: 62470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:23:53,451-Speed 3286.64 samples/sec Loss 6.8971 LearningRate 0.0512 Epoch: 12 Global Step: 62480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:23:56,475-Speed 3387.96 samples/sec Loss 6.8845 LearningRate 0.0512 Epoch: 12 Global Step: 62490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:23:59,665-Speed 3210.62 samples/sec Loss 6.8605 LearningRate 0.0512 Epoch: 12 Global Step: 62500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-19 23:24:02,685-Speed 3390.98 samples/sec Loss 6.6896 LearningRate 0.0511 Epoch: 12 Global Step: 62510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:24:05,662-Speed 3440.97 samples/sec Loss 6.7307 LearningRate 0.0511 Epoch: 12 Global Step: 62520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:24:08,636-Speed 3444.22 samples/sec Loss 6.7405 LearningRate 0.0511 Epoch: 12 Global Step: 62530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-19 23:24:11,624-Speed 3427.22 samples/sec Loss 6.7744 LearningRate 0.0511 Epoch: 12 Global Step: 62540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:14,604-Speed 3437.11 samples/sec Loss 6.8781 LearningRate 0.0511 Epoch: 12 Global Step: 62550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:17,676-Speed 3335.34 samples/sec Loss 6.7143 LearningRate 0.0511 Epoch: 12 Global Step: 62560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:20,653-Speed 3440.25 samples/sec Loss 6.5617 LearningRate 0.0510 Epoch: 12 Global Step: 62570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:23,644-Speed 3425.12 samples/sec Loss 6.8467 LearningRate 0.0510 Epoch: 12 Global Step: 62580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:26,643-Speed 3415.14 samples/sec Loss 6.8249 LearningRate 0.0510 Epoch: 12 Global Step: 62590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:29,649-Speed 3407.18 samples/sec Loss 6.7982 LearningRate 0.0510 Epoch: 12 Global Step: 62600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:32,639-Speed 3424.92 samples/sec Loss 6.8789 LearningRate 0.0510 Epoch: 12 Global Step: 62610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:24:35,654-Speed 3398.88 samples/sec Loss 6.6017 LearningRate 0.0510 Epoch: 12 Global Step: 62620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:38,666-Speed 3400.68 samples/sec Loss 6.6739 LearningRate 0.0509 Epoch: 12 Global Step: 62630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:41,689-Speed 3387.66 samples/sec Loss 6.8170 LearningRate 0.0509 Epoch: 12 Global Step: 62640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:44,681-Speed 3424.85 samples/sec Loss 6.7726 LearningRate 0.0509 Epoch: 12 Global Step: 62650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:47,667-Speed 3429.73 samples/sec Loss 6.7210 LearningRate 0.0509 Epoch: 12 Global Step: 62660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:50,644-Speed 3441.68 samples/sec Loss 6.7593 LearningRate 0.0509 Epoch: 12 Global Step: 62670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:53,631-Speed 3429.97 samples/sec Loss 6.5385 LearningRate 0.0509 Epoch: 12 Global Step: 62680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:56,648-Speed 3394.62 samples/sec Loss 6.7610 LearningRate 0.0508 Epoch: 12 Global Step: 62690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:24:59,713-Speed 3341.73 samples/sec Loss 6.7255 LearningRate 0.0508 Epoch: 12 Global Step: 62700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:02,816-Speed 3300.96 samples/sec Loss 6.8169 LearningRate 0.0508 Epoch: 12 Global Step: 62710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:05,808-Speed 3423.26 samples/sec Loss 6.8619 LearningRate 0.0508 Epoch: 12 Global Step: 62720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:25:08,856-Speed 3360.04 samples/sec Loss 6.8069 LearningRate 0.0508 Epoch: 12 Global Step: 62730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:25:11,843-Speed 3429.84 samples/sec Loss 6.8880 LearningRate 0.0508 Epoch: 12 Global Step: 62740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:25:14,822-Speed 3438.42 samples/sec Loss 6.7331 LearningRate 0.0508 Epoch: 12 Global Step: 62750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:25:17,804-Speed 3434.72 samples/sec Loss 6.8236 LearningRate 0.0507 Epoch: 12 Global Step: 62760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:25:20,784-Speed 3437.21 samples/sec Loss 6.8267 LearningRate 0.0507 Epoch: 12 Global Step: 62770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:25:23,749-Speed 3454.62 samples/sec Loss 6.7868 LearningRate 0.0507 Epoch: 12 Global Step: 62780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:26,774-Speed 3385.86 samples/sec Loss 6.7283 LearningRate 0.0507 Epoch: 12 Global Step: 62790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:29,774-Speed 3414.48 samples/sec Loss 6.7622 LearningRate 0.0507 Epoch: 12 Global Step: 62800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:32,758-Speed 3432.75 samples/sec Loss 6.6347 LearningRate 0.0507 Epoch: 12 Global Step: 62810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:35,761-Speed 3410.51 samples/sec Loss 6.7574 LearningRate 0.0506 Epoch: 12 Global Step: 62820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:38,787-Speed 3385.63 samples/sec Loss 6.6740 LearningRate 0.0506 Epoch: 12 Global Step: 62830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:41,825-Speed 3371.40 samples/sec Loss 6.7101 LearningRate 0.0506 Epoch: 12 Global Step: 62840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:44,813-Speed 3428.78 samples/sec Loss 6.7929 LearningRate 0.0506 Epoch: 12 Global Step: 62850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:47,972-Speed 3241.49 samples/sec Loss 6.6597 LearningRate 0.0506 Epoch: 12 Global Step: 62860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:51,025-Speed 3355.21 samples/sec Loss 6.7262 LearningRate 0.0506 Epoch: 12 Global Step: 62870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:25:54,010-Speed 3431.62 samples/sec Loss 6.6946 LearningRate 0.0505 Epoch: 12 Global Step: 62880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:25:57,006-Speed 3418.49 samples/sec Loss 6.6836 LearningRate 0.0505 Epoch: 12 Global Step: 62890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:26:00,013-Speed 3407.02 samples/sec Loss 6.7149 LearningRate 0.0505 Epoch: 12 Global Step: 62900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:26:02,982-Speed 3449.43 samples/sec Loss 6.7434 LearningRate 0.0505 Epoch: 12 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:05,969-Speed 3429.60 samples/sec Loss 6.7386 LearningRate 0.0505 Epoch: 12 Global Step: 62920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:08,972-Speed 3410.42 samples/sec Loss 6.6032 LearningRate 0.0505 Epoch: 12 Global Step: 62930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:11,966-Speed 3421.68 samples/sec Loss 6.7055 LearningRate 0.0505 Epoch: 12 Global Step: 62940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:14,953-Speed 3429.00 samples/sec Loss 6.8314 LearningRate 0.0504 Epoch: 12 Global Step: 62950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:17,939-Speed 3430.25 samples/sec Loss 6.8027 LearningRate 0.0504 Epoch: 12 Global Step: 62960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:20,965-Speed 3384.40 samples/sec Loss 6.6678 LearningRate 0.0504 Epoch: 12 Global Step: 62970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:23,962-Speed 3417.76 samples/sec Loss 6.6535 LearningRate 0.0504 Epoch: 12 Global Step: 62980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:26,976-Speed 3397.90 samples/sec Loss 6.7275 LearningRate 0.0504 Epoch: 12 Global Step: 62990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:29,982-Speed 3407.91 samples/sec Loss 6.6848 LearningRate 0.0504 Epoch: 12 Global Step: 63000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:32,961-Speed 3437.62 samples/sec Loss 6.7147 LearningRate 0.0503 Epoch: 12 Global Step: 63010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:26:35,961-Speed 3416.12 samples/sec Loss 6.7983 LearningRate 0.0503 Epoch: 12 Global Step: 63020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:26:39,074-Speed 3290.40 samples/sec Loss 6.6288 LearningRate 0.0503 Epoch: 12 Global Step: 63030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:26:42,045-Speed 3447.26 samples/sec Loss 6.7778 LearningRate 0.0503 Epoch: 12 Global Step: 63040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:45,094-Speed 3358.86 samples/sec Loss 6.7011 LearningRate 0.0503 Epoch: 12 Global Step: 63050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:26:48,053-Speed 3462.12 samples/sec Loss 6.7271 LearningRate 0.0503 Epoch: 12 Global Step: 63060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:26:51,094-Speed 3368.39 samples/sec Loss 6.7823 LearningRate 0.0502 Epoch: 12 Global Step: 63070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:26:54,104-Speed 3402.60 samples/sec Loss 6.6596 LearningRate 0.0502 Epoch: 12 Global Step: 63080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:26:57,100-Speed 3418.62 samples/sec Loss 6.7123 LearningRate 0.0502 Epoch: 12 Global Step: 63090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:00,085-Speed 3431.82 samples/sec Loss 6.7856 LearningRate 0.0502 Epoch: 12 Global Step: 63100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:03,077-Speed 3423.07 samples/sec Loss 6.6415 LearningRate 0.0502 Epoch: 12 Global Step: 63110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:06,081-Speed 3410.55 samples/sec Loss 6.7858 LearningRate 0.0502 Epoch: 12 Global Step: 63120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:09,222-Speed 3260.12 samples/sec Loss 6.5902 LearningRate 0.0502 Epoch: 12 Global Step: 63130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:12,304-Speed 3323.15 samples/sec Loss 6.7612 LearningRate 0.0501 Epoch: 12 Global Step: 63140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:15,509-Speed 3196.14 samples/sec Loss 6.6630 LearningRate 0.0501 Epoch: 12 Global Step: 63150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:18,496-Speed 3429.20 samples/sec Loss 6.7367 LearningRate 0.0501 Epoch: 12 Global Step: 63160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:27:21,564-Speed 3339.57 samples/sec Loss 6.7708 LearningRate 0.0501 Epoch: 12 Global Step: 63170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:27:24,556-Speed 3422.70 samples/sec Loss 6.6443 LearningRate 0.0501 Epoch: 12 Global Step: 63180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:27:27,534-Speed 3439.16 samples/sec Loss 6.7651 LearningRate 0.0501 Epoch: 12 Global Step: 63190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:27:30,512-Speed 3439.98 samples/sec Loss 6.7275 LearningRate 0.0500 Epoch: 12 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:27:33,493-Speed 3435.83 samples/sec Loss 6.8737 LearningRate 0.0500 Epoch: 12 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:27:36,472-Speed 3438.43 samples/sec Loss 6.7203 LearningRate 0.0500 Epoch: 12 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:27:39,437-Speed 3455.26 samples/sec Loss 6.7582 LearningRate 0.0500 Epoch: 12 Global Step: 63230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:42,423-Speed 3429.73 samples/sec Loss 6.7096 LearningRate 0.0500 Epoch: 12 Global Step: 63240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:45,540-Speed 3286.67 samples/sec Loss 6.7031 LearningRate 0.0500 Epoch: 12 Global Step: 63250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:48,616-Speed 3328.69 samples/sec Loss 6.5930 LearningRate 0.0499 Epoch: 12 Global Step: 63260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:51,803-Speed 3215.09 samples/sec Loss 6.6625 LearningRate 0.0499 Epoch: 12 Global Step: 63270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:54,943-Speed 3261.98 samples/sec Loss 6.6659 LearningRate 0.0499 Epoch: 12 Global Step: 63280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:27:57,946-Speed 3410.32 samples/sec Loss 6.8286 LearningRate 0.0499 Epoch: 12 Global Step: 63290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:28:00,962-Speed 3396.35 samples/sec Loss 6.7230 LearningRate 0.0499 Epoch: 12 Global Step: 63300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:28:04,013-Speed 3356.88 samples/sec Loss 6.7319 LearningRate 0.0499 Epoch: 12 Global Step: 63310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:28:07,055-Speed 3366.83 samples/sec Loss 6.6080 LearningRate 0.0498 Epoch: 12 Global Step: 63320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:28:10,034-Speed 3439.33 samples/sec Loss 6.6236 LearningRate 0.0498 Epoch: 12 Global Step: 63330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:13,108-Speed 3331.90 samples/sec Loss 6.7465 LearningRate 0.0498 Epoch: 12 Global Step: 63340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:16,258-Speed 3252.03 samples/sec Loss 6.7776 LearningRate 0.0498 Epoch: 12 Global Step: 63350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:19,360-Speed 3300.95 samples/sec Loss 6.7088 LearningRate 0.0498 Epoch: 12 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:22,521-Speed 3241.26 samples/sec Loss 6.6255 LearningRate 0.0498 Epoch: 12 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:25,530-Speed 3404.22 samples/sec Loss 6.7397 LearningRate 0.0498 Epoch: 12 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:28,506-Speed 3441.03 samples/sec Loss 6.6934 LearningRate 0.0497 Epoch: 12 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:31,487-Speed 3436.23 samples/sec Loss 6.7057 LearningRate 0.0497 Epoch: 12 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:34,481-Speed 3421.47 samples/sec Loss 6.7269 LearningRate 0.0497 Epoch: 12 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:37,490-Speed 3404.07 samples/sec Loss 6.9594 LearningRate 0.0497 Epoch: 12 Global Step: 63420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:40,520-Speed 3380.43 samples/sec Loss 6.7305 LearningRate 0.0497 Epoch: 12 Global Step: 63430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:28:43,511-Speed 3424.32 samples/sec Loss 6.5756 LearningRate 0.0497 Epoch: 12 Global Step: 63440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:28:46,495-Speed 3432.20 samples/sec Loss 6.8177 LearningRate 0.0496 Epoch: 12 Global Step: 63450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:28:49,470-Speed 3443.24 samples/sec Loss 6.8799 LearningRate 0.0496 Epoch: 12 Global Step: 63460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:52,449-Speed 3438.77 samples/sec Loss 6.6728 LearningRate 0.0496 Epoch: 12 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:55,496-Speed 3361.78 samples/sec Loss 6.7268 LearningRate 0.0496 Epoch: 12 Global Step: 63480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:28:58,615-Speed 3283.86 samples/sec Loss 6.7668 LearningRate 0.0496 Epoch: 12 Global Step: 63490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:01,678-Speed 3344.58 samples/sec Loss 6.6626 LearningRate 0.0496 Epoch: 12 Global Step: 63500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:04,680-Speed 3412.48 samples/sec Loss 6.7225 LearningRate 0.0496 Epoch: 12 Global Step: 63510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:07,708-Speed 3381.62 samples/sec Loss 6.7310 LearningRate 0.0495 Epoch: 12 Global Step: 63520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:10,691-Speed 3435.07 samples/sec Loss 6.5729 LearningRate 0.0495 Epoch: 12 Global Step: 63530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:13,673-Speed 3433.69 samples/sec Loss 6.8436 LearningRate 0.0495 Epoch: 12 Global Step: 63540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:16,672-Speed 3416.82 samples/sec Loss 6.7497 LearningRate 0.0495 Epoch: 12 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:19,738-Speed 3340.42 samples/sec Loss 6.7185 LearningRate 0.0495 Epoch: 12 Global Step: 63560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:29:22,821-Speed 3322.66 samples/sec Loss 6.7258 LearningRate 0.0495 Epoch: 12 Global Step: 63570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:25,834-Speed 3399.05 samples/sec Loss 6.5486 LearningRate 0.0494 Epoch: 12 Global Step: 63580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:28,814-Speed 3437.32 samples/sec Loss 6.8174 LearningRate 0.0494 Epoch: 12 Global Step: 63590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:31,797-Speed 3434.21 samples/sec Loss 6.5978 LearningRate 0.0494 Epoch: 12 Global Step: 63600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:34,787-Speed 3425.50 samples/sec Loss 6.7349 LearningRate 0.0494 Epoch: 12 Global Step: 63610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:37,775-Speed 3427.86 samples/sec Loss 6.6306 LearningRate 0.0494 Epoch: 12 Global Step: 63620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:40,754-Speed 3437.97 samples/sec Loss 6.5952 LearningRate 0.0494 Epoch: 12 Global Step: 63630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:43,737-Speed 3433.55 samples/sec Loss 6.6556 LearningRate 0.0493 Epoch: 12 Global Step: 63640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:46,722-Speed 3432.39 samples/sec Loss 6.7068 LearningRate 0.0493 Epoch: 12 Global Step: 63650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:49,706-Speed 3432.45 samples/sec Loss 6.5361 LearningRate 0.0493 Epoch: 12 Global Step: 63660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:52,697-Speed 3424.39 samples/sec Loss 6.5300 LearningRate 0.0493 Epoch: 12 Global Step: 63670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:29:55,674-Speed 3440.62 samples/sec Loss 6.6920 LearningRate 0.0493 Epoch: 12 Global Step: 63680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:29:58,676-Speed 3411.57 samples/sec Loss 6.6667 LearningRate 0.0493 Epoch: 12 Global Step: 63690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:01,713-Speed 3373.07 samples/sec Loss 6.7902 LearningRate 0.0493 Epoch: 12 Global Step: 63700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:04,695-Speed 3435.44 samples/sec Loss 6.7649 LearningRate 0.0492 Epoch: 12 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:07,701-Speed 3406.61 samples/sec Loss 6.6046 LearningRate 0.0492 Epoch: 12 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:10,692-Speed 3425.48 samples/sec Loss 6.5854 LearningRate 0.0492 Epoch: 12 Global Step: 63730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:13,702-Speed 3402.96 samples/sec Loss 6.5413 LearningRate 0.0492 Epoch: 12 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:16,691-Speed 3427.21 samples/sec Loss 6.6101 LearningRate 0.0492 Epoch: 12 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:19,670-Speed 3437.65 samples/sec Loss 6.6158 LearningRate 0.0492 Epoch: 12 Global Step: 63760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:22,655-Speed 3431.47 samples/sec Loss 6.5956 LearningRate 0.0491 Epoch: 12 Global Step: 63770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:25,640-Speed 3431.12 samples/sec Loss 6.5649 LearningRate 0.0491 Epoch: 12 Global Step: 63780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:30:28,627-Speed 3429.50 samples/sec Loss 6.7542 LearningRate 0.0491 Epoch: 12 Global Step: 63790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:30:31,616-Speed 3427.37 samples/sec Loss 6.8257 LearningRate 0.0491 Epoch: 12 Global Step: 63800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:30:34,617-Speed 3412.94 samples/sec Loss 6.8366 LearningRate 0.0491 Epoch: 12 Global Step: 63810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:30:37,606-Speed 3425.81 samples/sec Loss 6.6437 LearningRate 0.0491 Epoch: 12 Global Step: 63820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:30:40,610-Speed 3410.48 samples/sec Loss 6.5943 LearningRate 0.0490 Epoch: 12 Global Step: 63830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:30:43,592-Speed 3435.14 samples/sec Loss 6.6983 LearningRate 0.0490 Epoch: 12 Global Step: 63840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:30:46,559-Speed 3451.70 samples/sec Loss 6.7071 LearningRate 0.0490 Epoch: 12 Global Step: 63850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:49,538-Speed 3438.95 samples/sec Loss 6.7139 LearningRate 0.0490 Epoch: 12 Global Step: 63860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:52,520-Speed 3434.89 samples/sec Loss 6.7354 LearningRate 0.0490 Epoch: 12 Global Step: 63870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:55,502-Speed 3434.54 samples/sec Loss 6.7302 LearningRate 0.0490 Epoch: 12 Global Step: 63880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:30:58,497-Speed 3419.55 samples/sec Loss 6.7208 LearningRate 0.0490 Epoch: 12 Global Step: 63890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:31:01,500-Speed 3411.85 samples/sec Loss 6.7920 LearningRate 0.0489 Epoch: 12 Global Step: 63900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:31:04,484-Speed 3431.25 samples/sec Loss 6.6993 LearningRate 0.0489 Epoch: 12 Global Step: 63910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:31:07,470-Speed 3431.31 samples/sec Loss 6.6473 LearningRate 0.0489 Epoch: 12 Global Step: 63920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:31:10,448-Speed 3439.02 samples/sec Loss 6.6112 LearningRate 0.0489 Epoch: 12 Global Step: 63930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:31:13,440-Speed 3424.28 samples/sec Loss 6.5599 LearningRate 0.0489 Epoch: 12 Global Step: 63940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:31:16,440-Speed 3413.86 samples/sec Loss 6.4524 LearningRate 0.0489 Epoch: 12 Global Step: 63950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:31:19,413-Speed 3445.34 samples/sec Loss 6.7645 LearningRate 0.0488 Epoch: 12 Global Step: 63960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:31:22,435-Speed 3389.32 samples/sec Loss 6.6747 LearningRate 0.0488 Epoch: 12 Global Step: 63970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:31:25,450-Speed 3396.88 samples/sec Loss 6.6005 LearningRate 0.0488 Epoch: 12 Global Step: 63980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:31:28,523-Speed 3333.18 samples/sec Loss 6.6690 LearningRate 0.0488 Epoch: 12 Global Step: 63990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:31:31,522-Speed 3415.00 samples/sec Loss 6.4921 LearningRate 0.0488 Epoch: 12 Global Step: 64000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:32:15,300-[lfw][64000]XNorm: 21.816425 Training: 2022-01-19 23:32:15,301-[lfw][64000]Accuracy-Flip: 0.99750+-0.00300 Training: 2022-01-19 23:32:15,301-[lfw][64000]Accuracy-Highest: 0.99767 Training: 2022-01-19 23:33:05,630-[cfp_fp][64000]XNorm: 19.232091 Training: 2022-01-19 23:33:05,631-[cfp_fp][64000]Accuracy-Flip: 0.97029+-0.00761 Training: 2022-01-19 23:33:05,632-[cfp_fp][64000]Accuracy-Highest: 0.97029 Training: 2022-01-19 23:33:48,949-[agedb_30][64000]XNorm: 21.309324 Training: 2022-01-19 23:33:48,950-[agedb_30][64000]Accuracy-Flip: 0.97800+-0.00674 Training: 2022-01-19 23:33:48,950-[agedb_30][64000]Accuracy-Highest: 0.97800 Training: 2022-01-19 23:33:51,976-Speed 72.91 samples/sec Loss 6.6926 LearningRate 0.0488 Epoch: 12 Global Step: 64010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:33:54,988-Speed 3399.80 samples/sec Loss 6.6596 LearningRate 0.0488 Epoch: 12 Global Step: 64020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:33:58,021-Speed 3377.51 samples/sec Loss 6.6118 LearningRate 0.0487 Epoch: 12 Global Step: 64030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:34:00,991-Speed 3449.56 samples/sec Loss 6.7799 LearningRate 0.0487 Epoch: 12 Global Step: 64040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:34:03,957-Speed 3452.95 samples/sec Loss 6.7425 LearningRate 0.0487 Epoch: 12 Global Step: 64050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:34:06,929-Speed 3446.26 samples/sec Loss 6.6617 LearningRate 0.0487 Epoch: 12 Global Step: 64060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:34:09,963-Speed 3376.09 samples/sec Loss 6.5765 LearningRate 0.0487 Epoch: 12 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:12,956-Speed 3421.66 samples/sec Loss 6.5736 LearningRate 0.0487 Epoch: 12 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:15,954-Speed 3417.70 samples/sec Loss 6.6890 LearningRate 0.0486 Epoch: 12 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:18,985-Speed 3378.51 samples/sec Loss 6.5583 LearningRate 0.0486 Epoch: 12 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:22,020-Speed 3375.36 samples/sec Loss 6.5434 LearningRate 0.0486 Epoch: 12 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:24,996-Speed 3442.56 samples/sec Loss 6.5459 LearningRate 0.0486 Epoch: 12 Global Step: 64120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:28,006-Speed 3402.62 samples/sec Loss 6.7014 LearningRate 0.0486 Epoch: 12 Global Step: 64130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:31,045-Speed 3370.76 samples/sec Loss 6.6435 LearningRate 0.0486 Epoch: 12 Global Step: 64140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:34,021-Speed 3441.14 samples/sec Loss 6.6076 LearningRate 0.0485 Epoch: 12 Global Step: 64150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:36,998-Speed 3441.17 samples/sec Loss 6.6440 LearningRate 0.0485 Epoch: 12 Global Step: 64160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:39,980-Speed 3434.11 samples/sec Loss 6.6628 LearningRate 0.0485 Epoch: 12 Global Step: 64170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:34:42,959-Speed 3438.14 samples/sec Loss 6.6436 LearningRate 0.0485 Epoch: 12 Global Step: 64180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:34:45,945-Speed 3431.16 samples/sec Loss 6.5957 LearningRate 0.0485 Epoch: 12 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:48,917-Speed 3445.94 samples/sec Loss 6.5383 LearningRate 0.0485 Epoch: 12 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:51,908-Speed 3425.06 samples/sec Loss 6.4824 LearningRate 0.0485 Epoch: 12 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:54,881-Speed 3444.41 samples/sec Loss 6.7261 LearningRate 0.0484 Epoch: 12 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:34:57,897-Speed 3396.14 samples/sec Loss 6.5690 LearningRate 0.0484 Epoch: 12 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:00,905-Speed 3405.59 samples/sec Loss 6.6281 LearningRate 0.0484 Epoch: 12 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:03,911-Speed 3406.97 samples/sec Loss 6.6072 LearningRate 0.0484 Epoch: 12 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:06,898-Speed 3429.59 samples/sec Loss 6.5635 LearningRate 0.0484 Epoch: 12 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:09,906-Speed 3404.77 samples/sec Loss 6.6360 LearningRate 0.0484 Epoch: 12 Global Step: 64270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:12,883-Speed 3441.14 samples/sec Loss 6.6629 LearningRate 0.0483 Epoch: 12 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:15,875-Speed 3422.65 samples/sec Loss 6.5779 LearningRate 0.0483 Epoch: 12 Global Step: 64290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:35:18,850-Speed 3442.68 samples/sec Loss 6.5888 LearningRate 0.0483 Epoch: 12 Global Step: 64300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:35:21,830-Speed 3438.56 samples/sec Loss 6.5464 LearningRate 0.0483 Epoch: 12 Global Step: 64310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:35:24,822-Speed 3423.17 samples/sec Loss 6.6737 LearningRate 0.0483 Epoch: 12 Global Step: 64320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:35:27,784-Speed 3457.52 samples/sec Loss 6.5957 LearningRate 0.0483 Epoch: 12 Global Step: 64330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:30,761-Speed 3440.73 samples/sec Loss 6.6400 LearningRate 0.0483 Epoch: 12 Global Step: 64340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:33,746-Speed 3431.91 samples/sec Loss 6.5452 LearningRate 0.0482 Epoch: 12 Global Step: 64350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:36,734-Speed 3428.37 samples/sec Loss 6.8016 LearningRate 0.0482 Epoch: 12 Global Step: 64360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:39,757-Speed 3387.54 samples/sec Loss 6.5294 LearningRate 0.0482 Epoch: 12 Global Step: 64370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:42,810-Speed 3354.59 samples/sec Loss 6.5757 LearningRate 0.0482 Epoch: 12 Global Step: 64380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:45,799-Speed 3439.06 samples/sec Loss 6.6579 LearningRate 0.0482 Epoch: 12 Global Step: 64390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:48,776-Speed 3440.54 samples/sec Loss 6.6908 LearningRate 0.0482 Epoch: 12 Global Step: 64400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:51,764-Speed 3429.00 samples/sec Loss 6.5729 LearningRate 0.0481 Epoch: 12 Global Step: 64410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:54,753-Speed 3426.26 samples/sec Loss 6.6653 LearningRate 0.0481 Epoch: 12 Global Step: 64420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:35:57,732-Speed 3438.84 samples/sec Loss 6.6622 LearningRate 0.0481 Epoch: 12 Global Step: 64430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:36:00,693-Speed 3458.75 samples/sec Loss 6.6344 LearningRate 0.0481 Epoch: 12 Global Step: 64440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:36:03,717-Speed 3387.14 samples/sec Loss 6.4744 LearningRate 0.0481 Epoch: 12 Global Step: 64450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:36:06,723-Speed 3408.40 samples/sec Loss 6.5263 LearningRate 0.0481 Epoch: 12 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:36:09,716-Speed 3421.26 samples/sec Loss 6.5569 LearningRate 0.0481 Epoch: 12 Global Step: 64470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:36:12,711-Speed 3420.29 samples/sec Loss 6.6469 LearningRate 0.0480 Epoch: 12 Global Step: 64480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:36:15,726-Speed 3397.59 samples/sec Loss 6.4914 LearningRate 0.0480 Epoch: 12 Global Step: 64490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:36:18,763-Speed 3372.94 samples/sec Loss 6.6663 LearningRate 0.0480 Epoch: 12 Global Step: 64500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:36:21,775-Speed 3400.89 samples/sec Loss 6.6240 LearningRate 0.0480 Epoch: 12 Global Step: 64510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:36:24,771-Speed 3417.99 samples/sec Loss 6.5777 LearningRate 0.0480 Epoch: 12 Global Step: 64520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:36:27,798-Speed 3383.52 samples/sec Loss 6.5800 LearningRate 0.0480 Epoch: 12 Global Step: 64530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:36:30,870-Speed 3334.73 samples/sec Loss 6.6950 LearningRate 0.0479 Epoch: 12 Global Step: 64540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:36:33,891-Speed 3390.11 samples/sec Loss 6.5397 LearningRate 0.0479 Epoch: 12 Global Step: 64550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:36:36,996-Speed 3298.75 samples/sec Loss 6.6525 LearningRate 0.0479 Epoch: 12 Global Step: 64560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:36:39,980-Speed 3432.47 samples/sec Loss 6.6866 LearningRate 0.0479 Epoch: 12 Global Step: 64570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:36:43,068-Speed 3317.18 samples/sec Loss 6.5806 LearningRate 0.0479 Epoch: 12 Global Step: 64580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:36:46,055-Speed 3430.23 samples/sec Loss 6.6568 LearningRate 0.0479 Epoch: 12 Global Step: 64590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:36:49,061-Speed 3407.18 samples/sec Loss 6.5518 LearningRate 0.0478 Epoch: 12 Global Step: 64600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:36:52,150-Speed 3315.63 samples/sec Loss 6.6410 LearningRate 0.0478 Epoch: 12 Global Step: 64610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:36:55,162-Speed 3400.66 samples/sec Loss 6.5470 LearningRate 0.0478 Epoch: 12 Global Step: 64620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:36:58,169-Speed 3406.71 samples/sec Loss 6.6444 LearningRate 0.0478 Epoch: 12 Global Step: 64630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:01,157-Speed 3427.27 samples/sec Loss 6.6350 LearningRate 0.0478 Epoch: 12 Global Step: 64640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:04,149-Speed 3423.51 samples/sec Loss 6.4331 LearningRate 0.0478 Epoch: 12 Global Step: 64650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:07,123-Speed 3444.56 samples/sec Loss 6.6427 LearningRate 0.0478 Epoch: 12 Global Step: 64660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:10,195-Speed 3333.83 samples/sec Loss 6.5784 LearningRate 0.0477 Epoch: 12 Global Step: 64670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:13,234-Speed 3370.56 samples/sec Loss 6.5649 LearningRate 0.0477 Epoch: 12 Global Step: 64680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:16,208-Speed 3444.59 samples/sec Loss 6.4443 LearningRate 0.0477 Epoch: 12 Global Step: 64690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:19,192-Speed 3432.27 samples/sec Loss 6.5420 LearningRate 0.0477 Epoch: 12 Global Step: 64700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:22,167-Speed 3442.91 samples/sec Loss 6.5220 LearningRate 0.0477 Epoch: 12 Global Step: 64710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:25,145-Speed 3439.11 samples/sec Loss 6.6480 LearningRate 0.0477 Epoch: 12 Global Step: 64720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:28,124-Speed 3438.60 samples/sec Loss 6.7160 LearningRate 0.0476 Epoch: 12 Global Step: 64730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:37:31,110-Speed 3430.31 samples/sec Loss 6.6497 LearningRate 0.0476 Epoch: 12 Global Step: 64740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:37:34,094-Speed 3433.84 samples/sec Loss 6.6260 LearningRate 0.0476 Epoch: 12 Global Step: 64750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:37:37,110-Speed 3396.10 samples/sec Loss 6.6127 LearningRate 0.0476 Epoch: 12 Global Step: 64760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:40,126-Speed 3396.12 samples/sec Loss 6.5343 LearningRate 0.0476 Epoch: 12 Global Step: 64770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:43,148-Speed 3389.25 samples/sec Loss 6.5725 LearningRate 0.0476 Epoch: 12 Global Step: 64780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:46,171-Speed 3388.41 samples/sec Loss 6.6938 LearningRate 0.0476 Epoch: 12 Global Step: 64790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:49,195-Speed 3387.33 samples/sec Loss 6.6009 LearningRate 0.0475 Epoch: 12 Global Step: 64800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:52,222-Speed 3383.60 samples/sec Loss 6.7162 LearningRate 0.0475 Epoch: 12 Global Step: 64810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:55,230-Speed 3404.77 samples/sec Loss 6.6444 LearningRate 0.0475 Epoch: 12 Global Step: 64820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:37:58,217-Speed 3429.41 samples/sec Loss 6.5693 LearningRate 0.0475 Epoch: 12 Global Step: 64830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:01,204-Speed 3429.58 samples/sec Loss 6.5996 LearningRate 0.0475 Epoch: 12 Global Step: 64840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:04,213-Speed 3403.34 samples/sec Loss 6.5310 LearningRate 0.0475 Epoch: 12 Global Step: 64850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:07,194-Speed 3437.09 samples/sec Loss 6.4848 LearningRate 0.0474 Epoch: 12 Global Step: 64860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:38:10,251-Speed 3350.24 samples/sec Loss 6.5742 LearningRate 0.0474 Epoch: 12 Global Step: 64870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:38:13,416-Speed 3235.96 samples/sec Loss 6.5566 LearningRate 0.0474 Epoch: 12 Global Step: 64880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:38:16,394-Speed 3439.49 samples/sec Loss 6.6193 LearningRate 0.0474 Epoch: 12 Global Step: 64890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:38:19,379-Speed 3431.65 samples/sec Loss 6.5957 LearningRate 0.0474 Epoch: 12 Global Step: 64900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:38:22,356-Speed 3440.88 samples/sec Loss 6.5762 LearningRate 0.0474 Epoch: 12 Global Step: 64910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:38:25,319-Speed 3455.91 samples/sec Loss 6.3906 LearningRate 0.0474 Epoch: 12 Global Step: 64920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:28,305-Speed 3430.87 samples/sec Loss 6.7152 LearningRate 0.0473 Epoch: 12 Global Step: 64930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:31,366-Speed 3345.46 samples/sec Loss 6.4074 LearningRate 0.0473 Epoch: 12 Global Step: 64940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:34,346-Speed 3437.90 samples/sec Loss 6.6711 LearningRate 0.0473 Epoch: 12 Global Step: 64950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:37,324-Speed 3439.64 samples/sec Loss 6.5418 LearningRate 0.0473 Epoch: 12 Global Step: 64960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:40,317-Speed 3421.75 samples/sec Loss 6.6770 LearningRate 0.0473 Epoch: 12 Global Step: 64970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:43,301-Speed 3433.21 samples/sec Loss 6.5958 LearningRate 0.0473 Epoch: 12 Global Step: 64980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:46,277-Speed 3441.72 samples/sec Loss 6.4415 LearningRate 0.0472 Epoch: 12 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:49,259-Speed 3434.62 samples/sec Loss 6.5084 LearningRate 0.0472 Epoch: 12 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:52,263-Speed 3409.45 samples/sec Loss 6.6349 LearningRate 0.0472 Epoch: 12 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:38:55,248-Speed 3431.92 samples/sec Loss 6.6161 LearningRate 0.0472 Epoch: 12 Global Step: 65020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:38:58,236-Speed 3427.27 samples/sec Loss 6.6472 LearningRate 0.0472 Epoch: 12 Global Step: 65030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:39:01,214-Speed 3440.55 samples/sec Loss 6.6618 LearningRate 0.0472 Epoch: 12 Global Step: 65040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:39:04,208-Speed 3420.86 samples/sec Loss 6.5617 LearningRate 0.0472 Epoch: 12 Global Step: 65050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:39:07,214-Speed 3407.67 samples/sec Loss 6.7332 LearningRate 0.0471 Epoch: 12 Global Step: 65060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:39:10,207-Speed 3422.19 samples/sec Loss 6.5146 LearningRate 0.0471 Epoch: 12 Global Step: 65070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:39:13,201-Speed 3421.16 samples/sec Loss 6.5887 LearningRate 0.0471 Epoch: 12 Global Step: 65080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:39:16,191-Speed 3425.29 samples/sec Loss 6.4842 LearningRate 0.0471 Epoch: 12 Global Step: 65090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:39:19,191-Speed 3413.56 samples/sec Loss 6.4180 LearningRate 0.0471 Epoch: 12 Global Step: 65100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:39:22,188-Speed 3418.94 samples/sec Loss 6.3730 LearningRate 0.0471 Epoch: 12 Global Step: 65110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:39:25,176-Speed 3427.58 samples/sec Loss 6.5262 LearningRate 0.0470 Epoch: 12 Global Step: 65120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:39:28,198-Speed 3389.17 samples/sec Loss 6.5729 LearningRate 0.0470 Epoch: 12 Global Step: 65130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:39:31,305-Speed 3297.34 samples/sec Loss 6.5073 LearningRate 0.0470 Epoch: 12 Global Step: 65140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:39:34,283-Speed 3438.40 samples/sec Loss 6.5535 LearningRate 0.0470 Epoch: 12 Global Step: 65150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:39:37,272-Speed 3427.85 samples/sec Loss 6.6391 LearningRate 0.0470 Epoch: 12 Global Step: 65160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:39:40,248-Speed 3441.52 samples/sec Loss 6.5186 LearningRate 0.0470 Epoch: 12 Global Step: 65170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:39:43,230-Speed 3434.01 samples/sec Loss 6.5911 LearningRate 0.0470 Epoch: 12 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:39:46,208-Speed 3440.40 samples/sec Loss 6.5417 LearningRate 0.0469 Epoch: 12 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:39:49,222-Speed 3398.04 samples/sec Loss 6.5796 LearningRate 0.0469 Epoch: 12 Global Step: 65200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:39:52,231-Speed 3404.35 samples/sec Loss 6.6628 LearningRate 0.0469 Epoch: 12 Global Step: 65210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:39:55,239-Speed 3405.07 samples/sec Loss 6.6498 LearningRate 0.0469 Epoch: 12 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:39:58,215-Speed 3441.37 samples/sec Loss 6.4636 LearningRate 0.0469 Epoch: 12 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:40:01,207-Speed 3424.68 samples/sec Loss 6.4478 LearningRate 0.0469 Epoch: 12 Global Step: 65240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:40:04,209-Speed 3412.00 samples/sec Loss 6.3965 LearningRate 0.0468 Epoch: 12 Global Step: 65250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:40:07,205-Speed 3418.74 samples/sec Loss 6.4241 LearningRate 0.0468 Epoch: 12 Global Step: 65260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:40:10,189-Speed 3432.44 samples/sec Loss 6.5136 LearningRate 0.0468 Epoch: 12 Global Step: 65270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:40:13,186-Speed 3417.01 samples/sec Loss 6.7188 LearningRate 0.0468 Epoch: 12 Global Step: 65280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:40:16,203-Speed 3395.98 samples/sec Loss 6.5558 LearningRate 0.0468 Epoch: 12 Global Step: 65290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:40:19,209-Speed 3406.97 samples/sec Loss 6.6128 LearningRate 0.0468 Epoch: 12 Global Step: 65300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:40:22,204-Speed 3420.20 samples/sec Loss 6.5791 LearningRate 0.0468 Epoch: 12 Global Step: 65310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:40:25,196-Speed 3423.30 samples/sec Loss 6.6240 LearningRate 0.0467 Epoch: 12 Global Step: 65320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:40:28,179-Speed 3433.62 samples/sec Loss 6.6312 LearningRate 0.0467 Epoch: 12 Global Step: 65330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:40:31,161-Speed 3436.67 samples/sec Loss 6.5433 LearningRate 0.0467 Epoch: 12 Global Step: 65340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:40:34,141-Speed 3436.72 samples/sec Loss 6.5410 LearningRate 0.0467 Epoch: 12 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:40:37,130-Speed 3427.89 samples/sec Loss 6.4559 LearningRate 0.0467 Epoch: 12 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:40:40,115-Speed 3431.09 samples/sec Loss 6.7055 LearningRate 0.0467 Epoch: 12 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:40:43,185-Speed 3335.98 samples/sec Loss 6.4016 LearningRate 0.0466 Epoch: 12 Global Step: 65380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:40:46,203-Speed 3394.57 samples/sec Loss 6.4720 LearningRate 0.0466 Epoch: 12 Global Step: 65390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:40:49,197-Speed 3420.12 samples/sec Loss 6.5297 LearningRate 0.0466 Epoch: 12 Global Step: 65400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:40:52,177-Speed 3438.28 samples/sec Loss 6.3908 LearningRate 0.0466 Epoch: 12 Global Step: 65410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:40:55,176-Speed 3414.89 samples/sec Loss 6.5637 LearningRate 0.0466 Epoch: 12 Global Step: 65420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:40:58,166-Speed 3425.85 samples/sec Loss 6.4171 LearningRate 0.0466 Epoch: 12 Global Step: 65430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:41:01,153-Speed 3429.46 samples/sec Loss 6.3741 LearningRate 0.0466 Epoch: 12 Global Step: 65440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:41:04,143-Speed 3424.78 samples/sec Loss 6.5047 LearningRate 0.0465 Epoch: 12 Global Step: 65450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:41:07,140-Speed 3418.16 samples/sec Loss 6.4540 LearningRate 0.0465 Epoch: 12 Global Step: 65460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:41:10,143-Speed 3410.12 samples/sec Loss 6.5276 LearningRate 0.0465 Epoch: 12 Global Step: 65470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:41:13,104-Speed 3458.99 samples/sec Loss 6.5157 LearningRate 0.0465 Epoch: 12 Global Step: 65480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:41:16,110-Speed 3408.44 samples/sec Loss 6.7095 LearningRate 0.0465 Epoch: 12 Global Step: 65490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:41:19,122-Speed 3399.96 samples/sec Loss 6.4320 LearningRate 0.0465 Epoch: 12 Global Step: 65500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:41:22,106-Speed 3433.48 samples/sec Loss 6.5520 LearningRate 0.0465 Epoch: 12 Global Step: 65510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:41:25,091-Speed 3431.49 samples/sec Loss 6.4824 LearningRate 0.0464 Epoch: 12 Global Step: 65520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:41:28,072-Speed 3435.35 samples/sec Loss 6.4977 LearningRate 0.0464 Epoch: 12 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:41:31,072-Speed 3414.32 samples/sec Loss 6.6701 LearningRate 0.0464 Epoch: 12 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:41:34,084-Speed 3400.85 samples/sec Loss 6.3881 LearningRate 0.0464 Epoch: 12 Global Step: 65550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:41:37,201-Speed 3285.80 samples/sec Loss 6.4729 LearningRate 0.0464 Epoch: 12 Global Step: 65560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:41:40,324-Speed 3280.06 samples/sec Loss 6.3874 LearningRate 0.0464 Epoch: 12 Global Step: 65570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:41:43,324-Speed 3413.57 samples/sec Loss 6.5810 LearningRate 0.0463 Epoch: 12 Global Step: 65580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:41:46,304-Speed 3437.95 samples/sec Loss 6.5195 LearningRate 0.0463 Epoch: 12 Global Step: 65590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:41:49,290-Speed 3430.07 samples/sec Loss 6.5024 LearningRate 0.0463 Epoch: 12 Global Step: 65600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:41:52,270-Speed 3437.04 samples/sec Loss 6.4119 LearningRate 0.0463 Epoch: 12 Global Step: 65610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:41:55,402-Speed 3270.90 samples/sec Loss 6.5233 LearningRate 0.0463 Epoch: 12 Global Step: 65620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:41:58,387-Speed 3430.65 samples/sec Loss 6.6090 LearningRate 0.0463 Epoch: 12 Global Step: 65630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:42:01,367-Speed 3437.35 samples/sec Loss 6.5496 LearningRate 0.0463 Epoch: 12 Global Step: 65640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:42:04,358-Speed 3424.78 samples/sec Loss 6.5177 LearningRate 0.0462 Epoch: 12 Global Step: 65650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:42:07,338-Speed 3436.81 samples/sec Loss 6.3833 LearningRate 0.0462 Epoch: 12 Global Step: 65660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:42:10,311-Speed 3444.99 samples/sec Loss 6.5717 LearningRate 0.0462 Epoch: 12 Global Step: 65670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:13,307-Speed 3418.77 samples/sec Loss 6.6408 LearningRate 0.0462 Epoch: 12 Global Step: 65680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:16,291-Speed 3433.73 samples/sec Loss 6.3727 LearningRate 0.0462 Epoch: 12 Global Step: 65690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:19,298-Speed 3405.21 samples/sec Loss 6.4749 LearningRate 0.0462 Epoch: 12 Global Step: 65700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:22,386-Speed 3318.15 samples/sec Loss 6.4297 LearningRate 0.0461 Epoch: 12 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:25,494-Speed 3295.08 samples/sec Loss 6.4562 LearningRate 0.0461 Epoch: 12 Global Step: 65720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:28,541-Speed 3361.55 samples/sec Loss 6.6986 LearningRate 0.0461 Epoch: 12 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:31,613-Speed 3334.56 samples/sec Loss 6.5373 LearningRate 0.0461 Epoch: 12 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:34,679-Speed 3340.34 samples/sec Loss 6.6309 LearningRate 0.0461 Epoch: 12 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:46,919-Speed 836.68 samples/sec Loss 6.0395 LearningRate 0.0461 Epoch: 13 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:49,897-Speed 3439.29 samples/sec Loss 5.7442 LearningRate 0.0461 Epoch: 13 Global Step: 65770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:52,885-Speed 3428.77 samples/sec Loss 5.7810 LearningRate 0.0460 Epoch: 13 Global Step: 65780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:55,867-Speed 3434.86 samples/sec Loss 5.6274 LearningRate 0.0460 Epoch: 13 Global Step: 65790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:42:58,926-Speed 3347.67 samples/sec Loss 5.7459 LearningRate 0.0460 Epoch: 13 Global Step: 65800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:43:01,909-Speed 3434.26 samples/sec Loss 5.6869 LearningRate 0.0460 Epoch: 13 Global Step: 65810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:43:04,983-Speed 3331.46 samples/sec Loss 5.8305 LearningRate 0.0460 Epoch: 13 Global Step: 65820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:43:07,971-Speed 3428.18 samples/sec Loss 5.6186 LearningRate 0.0460 Epoch: 13 Global Step: 65830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:43:10,955-Speed 3433.72 samples/sec Loss 5.7483 LearningRate 0.0459 Epoch: 13 Global Step: 65840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:43:14,006-Speed 3357.08 samples/sec Loss 5.7663 LearningRate 0.0459 Epoch: 13 Global Step: 65850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:43:17,064-Speed 3350.00 samples/sec Loss 5.8542 LearningRate 0.0459 Epoch: 13 Global Step: 65860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:43:20,054-Speed 3425.19 samples/sec Loss 5.8571 LearningRate 0.0459 Epoch: 13 Global Step: 65870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:23,072-Speed 3393.81 samples/sec Loss 5.6907 LearningRate 0.0459 Epoch: 13 Global Step: 65880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:26,091-Speed 3393.91 samples/sec Loss 5.8611 LearningRate 0.0459 Epoch: 13 Global Step: 65890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:29,120-Speed 3381.25 samples/sec Loss 5.8239 LearningRate 0.0459 Epoch: 13 Global Step: 65900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:32,153-Speed 3376.64 samples/sec Loss 5.9259 LearningRate 0.0458 Epoch: 13 Global Step: 65910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:35,192-Speed 3370.38 samples/sec Loss 5.7748 LearningRate 0.0458 Epoch: 13 Global Step: 65920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:38,208-Speed 3396.41 samples/sec Loss 5.9972 LearningRate 0.0458 Epoch: 13 Global Step: 65930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:41,262-Speed 3353.22 samples/sec Loss 5.7423 LearningRate 0.0458 Epoch: 13 Global Step: 65940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:44,264-Speed 3412.99 samples/sec Loss 5.9157 LearningRate 0.0458 Epoch: 13 Global Step: 65950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:47,291-Speed 3383.35 samples/sec Loss 5.9921 LearningRate 0.0458 Epoch: 13 Global Step: 65960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:50,497-Speed 3194.68 samples/sec Loss 5.8371 LearningRate 0.0458 Epoch: 13 Global Step: 65970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:53,492-Speed 3420.76 samples/sec Loss 6.0210 LearningRate 0.0457 Epoch: 13 Global Step: 65980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:56,484-Speed 3423.45 samples/sec Loss 5.9340 LearningRate 0.0457 Epoch: 13 Global Step: 65990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:43:59,539-Speed 3352.34 samples/sec Loss 5.9005 LearningRate 0.0457 Epoch: 13 Global Step: 66000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:44:42,543-[lfw][66000]XNorm: 23.165965 Training: 2022-01-19 23:44:42,544-[lfw][66000]Accuracy-Flip: 0.99733+-0.00249 Training: 2022-01-19 23:44:42,544-[lfw][66000]Accuracy-Highest: 0.99767 Training: 2022-01-19 23:45:32,483-[cfp_fp][66000]XNorm: 20.607939 Training: 2022-01-19 23:45:32,484-[cfp_fp][66000]Accuracy-Flip: 0.97157+-0.01093 Training: 2022-01-19 23:45:32,484-[cfp_fp][66000]Accuracy-Highest: 0.97157 Training: 2022-01-19 23:46:15,329-[agedb_30][66000]XNorm: 22.936747 Training: 2022-01-19 23:46:15,330-[agedb_30][66000]Accuracy-Flip: 0.97550+-0.00506 Training: 2022-01-19 23:46:15,331-[agedb_30][66000]Accuracy-Highest: 0.97800 Training: 2022-01-19 23:46:18,302-Speed 73.80 samples/sec Loss 5.8556 LearningRate 0.0457 Epoch: 13 Global Step: 66010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:46:21,269-Speed 3452.12 samples/sec Loss 5.9046 LearningRate 0.0457 Epoch: 13 Global Step: 66020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:46:24,249-Speed 3437.01 samples/sec Loss 5.8855 LearningRate 0.0457 Epoch: 13 Global Step: 66030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:46:27,222-Speed 3451.27 samples/sec Loss 5.9390 LearningRate 0.0456 Epoch: 13 Global Step: 66040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:46:30,227-Speed 3408.06 samples/sec Loss 5.9940 LearningRate 0.0456 Epoch: 13 Global Step: 66050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:46:33,203-Speed 3442.89 samples/sec Loss 5.9561 LearningRate 0.0456 Epoch: 13 Global Step: 66060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:46:36,228-Speed 3385.39 samples/sec Loss 5.7786 LearningRate 0.0456 Epoch: 13 Global Step: 66070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:46:39,281-Speed 3354.35 samples/sec Loss 5.9644 LearningRate 0.0456 Epoch: 13 Global Step: 66080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:46:42,258-Speed 3441.70 samples/sec Loss 5.8229 LearningRate 0.0456 Epoch: 13 Global Step: 66090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:46:45,245-Speed 3429.35 samples/sec Loss 6.0783 LearningRate 0.0456 Epoch: 13 Global Step: 66100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:46:48,216-Speed 3447.19 samples/sec Loss 5.9647 LearningRate 0.0455 Epoch: 13 Global Step: 66110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:46:51,206-Speed 3426.21 samples/sec Loss 6.2286 LearningRate 0.0455 Epoch: 13 Global Step: 66120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:46:54,192-Speed 3429.27 samples/sec Loss 6.0012 LearningRate 0.0455 Epoch: 13 Global Step: 66130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:46:57,223-Speed 3379.71 samples/sec Loss 6.0160 LearningRate 0.0455 Epoch: 13 Global Step: 66140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:00,200-Speed 3440.91 samples/sec Loss 5.9334 LearningRate 0.0455 Epoch: 13 Global Step: 66150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:03,180-Speed 3436.89 samples/sec Loss 5.9744 LearningRate 0.0455 Epoch: 13 Global Step: 66160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:06,158-Speed 3439.95 samples/sec Loss 6.0608 LearningRate 0.0455 Epoch: 13 Global Step: 66170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:09,137-Speed 3437.57 samples/sec Loss 5.9654 LearningRate 0.0454 Epoch: 13 Global Step: 66180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:12,115-Speed 3440.31 samples/sec Loss 6.0072 LearningRate 0.0454 Epoch: 13 Global Step: 66190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:15,095-Speed 3437.46 samples/sec Loss 6.1423 LearningRate 0.0454 Epoch: 13 Global Step: 66200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:18,215-Speed 3282.91 samples/sec Loss 6.1383 LearningRate 0.0454 Epoch: 13 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:21,230-Speed 3397.26 samples/sec Loss 5.7964 LearningRate 0.0454 Epoch: 13 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:24,295-Speed 3341.78 samples/sec Loss 6.0415 LearningRate 0.0454 Epoch: 13 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:27,303-Speed 3405.11 samples/sec Loss 5.9984 LearningRate 0.0453 Epoch: 13 Global Step: 66240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:47:30,296-Speed 3421.82 samples/sec Loss 5.9260 LearningRate 0.0453 Epoch: 13 Global Step: 66250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:47:33,270-Speed 3445.19 samples/sec Loss 5.9428 LearningRate 0.0453 Epoch: 13 Global Step: 66260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:47:36,270-Speed 3414.20 samples/sec Loss 6.1683 LearningRate 0.0453 Epoch: 13 Global Step: 66270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:47:39,243-Speed 3444.93 samples/sec Loss 6.1683 LearningRate 0.0453 Epoch: 13 Global Step: 66280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:42,227-Speed 3432.96 samples/sec Loss 6.0286 LearningRate 0.0453 Epoch: 13 Global Step: 66290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:45,286-Speed 3348.22 samples/sec Loss 6.0441 LearningRate 0.0453 Epoch: 13 Global Step: 66300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:48,368-Speed 3323.76 samples/sec Loss 6.1439 LearningRate 0.0452 Epoch: 13 Global Step: 66310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:51,335-Speed 3451.34 samples/sec Loss 6.0576 LearningRate 0.0452 Epoch: 13 Global Step: 66320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:54,304-Speed 3449.50 samples/sec Loss 6.0804 LearningRate 0.0452 Epoch: 13 Global Step: 66330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:47:57,273-Speed 3450.49 samples/sec Loss 6.0493 LearningRate 0.0452 Epoch: 13 Global Step: 66340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:00,243-Speed 3448.62 samples/sec Loss 6.0044 LearningRate 0.0452 Epoch: 13 Global Step: 66350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:03,231-Speed 3428.17 samples/sec Loss 6.1304 LearningRate 0.0452 Epoch: 13 Global Step: 66360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:06,205-Speed 3444.78 samples/sec Loss 6.1890 LearningRate 0.0451 Epoch: 13 Global Step: 66370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:09,159-Speed 3466.81 samples/sec Loss 6.2073 LearningRate 0.0451 Epoch: 13 Global Step: 66380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:12,127-Speed 3451.75 samples/sec Loss 6.1986 LearningRate 0.0451 Epoch: 13 Global Step: 66390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:15,098-Speed 3446.51 samples/sec Loss 6.1504 LearningRate 0.0451 Epoch: 13 Global Step: 66400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:18,066-Speed 3452.20 samples/sec Loss 6.0921 LearningRate 0.0451 Epoch: 13 Global Step: 66410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:21,053-Speed 3428.55 samples/sec Loss 6.1687 LearningRate 0.0451 Epoch: 13 Global Step: 66420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:24,030-Speed 3439.96 samples/sec Loss 5.9894 LearningRate 0.0451 Epoch: 13 Global Step: 66430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:27,004-Speed 3444.53 samples/sec Loss 6.1439 LearningRate 0.0450 Epoch: 13 Global Step: 66440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:30,153-Speed 3252.97 samples/sec Loss 6.1122 LearningRate 0.0450 Epoch: 13 Global Step: 66450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:33,237-Speed 3321.28 samples/sec Loss 6.2283 LearningRate 0.0450 Epoch: 13 Global Step: 66460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:36,210-Speed 3445.02 samples/sec Loss 6.0991 LearningRate 0.0450 Epoch: 13 Global Step: 66470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:48:39,198-Speed 3427.84 samples/sec Loss 6.2129 LearningRate 0.0450 Epoch: 13 Global Step: 66480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:48:42,344-Speed 3256.36 samples/sec Loss 6.1179 LearningRate 0.0450 Epoch: 13 Global Step: 66490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:48:45,333-Speed 3425.94 samples/sec Loss 6.0379 LearningRate 0.0450 Epoch: 13 Global Step: 66500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:48:48,302-Speed 3450.75 samples/sec Loss 6.1635 LearningRate 0.0449 Epoch: 13 Global Step: 66510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:48:51,270-Speed 3451.02 samples/sec Loss 6.2231 LearningRate 0.0449 Epoch: 13 Global Step: 66520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:48:54,241-Speed 3447.56 samples/sec Loss 6.0150 LearningRate 0.0449 Epoch: 13 Global Step: 66530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:48:57,262-Speed 3389.92 samples/sec Loss 6.1052 LearningRate 0.0449 Epoch: 13 Global Step: 66540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:49:00,403-Speed 3261.45 samples/sec Loss 6.1353 LearningRate 0.0449 Epoch: 13 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:03,494-Speed 3314.40 samples/sec Loss 6.1575 LearningRate 0.0449 Epoch: 13 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:06,492-Speed 3415.91 samples/sec Loss 6.2000 LearningRate 0.0448 Epoch: 13 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:09,462-Speed 3448.90 samples/sec Loss 6.1029 LearningRate 0.0448 Epoch: 13 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:12,437-Speed 3442.39 samples/sec Loss 6.0986 LearningRate 0.0448 Epoch: 13 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:15,413-Speed 3442.05 samples/sec Loss 6.2378 LearningRate 0.0448 Epoch: 13 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:18,396-Speed 3433.34 samples/sec Loss 6.3233 LearningRate 0.0448 Epoch: 13 Global Step: 66610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:21,378-Speed 3434.76 samples/sec Loss 6.2770 LearningRate 0.0448 Epoch: 13 Global Step: 66620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:24,463-Speed 3320.53 samples/sec Loss 6.1611 LearningRate 0.0448 Epoch: 13 Global Step: 66630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:27,504-Speed 3368.20 samples/sec Loss 6.3189 LearningRate 0.0447 Epoch: 13 Global Step: 66640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:30,484-Speed 3436.73 samples/sec Loss 6.1494 LearningRate 0.0447 Epoch: 13 Global Step: 66650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:49:33,458-Speed 3444.54 samples/sec Loss 6.1284 LearningRate 0.0447 Epoch: 13 Global Step: 66660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:49:36,455-Speed 3417.73 samples/sec Loss 5.9974 LearningRate 0.0447 Epoch: 13 Global Step: 66670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:49:39,596-Speed 3261.24 samples/sec Loss 6.1707 LearningRate 0.0447 Epoch: 13 Global Step: 66680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:49:42,611-Speed 3397.33 samples/sec Loss 6.2177 LearningRate 0.0447 Epoch: 13 Global Step: 66690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:49:45,596-Speed 3431.08 samples/sec Loss 6.1159 LearningRate 0.0447 Epoch: 13 Global Step: 66700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:49:48,568-Speed 3447.19 samples/sec Loss 6.3284 LearningRate 0.0446 Epoch: 13 Global Step: 66710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:49:51,562-Speed 3419.87 samples/sec Loss 6.2109 LearningRate 0.0446 Epoch: 13 Global Step: 66720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:54,569-Speed 3406.62 samples/sec Loss 6.2350 LearningRate 0.0446 Epoch: 13 Global Step: 66730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:49:57,560-Speed 3425.11 samples/sec Loss 6.1797 LearningRate 0.0446 Epoch: 13 Global Step: 66740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:00,574-Speed 3397.82 samples/sec Loss 6.1551 LearningRate 0.0446 Epoch: 13 Global Step: 66750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:03,593-Speed 3393.23 samples/sec Loss 6.1709 LearningRate 0.0446 Epoch: 13 Global Step: 66760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:06,564-Speed 3447.96 samples/sec Loss 6.1546 LearningRate 0.0446 Epoch: 13 Global Step: 66770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:09,535-Speed 3446.64 samples/sec Loss 6.1172 LearningRate 0.0445 Epoch: 13 Global Step: 66780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:12,520-Speed 3432.42 samples/sec Loss 6.2886 LearningRate 0.0445 Epoch: 13 Global Step: 66790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:15,546-Speed 3384.30 samples/sec Loss 6.1306 LearningRate 0.0445 Epoch: 13 Global Step: 66800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:18,641-Speed 3309.29 samples/sec Loss 6.2819 LearningRate 0.0445 Epoch: 13 Global Step: 66810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:21,613-Speed 3446.44 samples/sec Loss 6.2639 LearningRate 0.0445 Epoch: 13 Global Step: 66820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:50:24,590-Speed 3441.68 samples/sec Loss 6.1904 LearningRate 0.0445 Epoch: 13 Global Step: 66830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:50:27,564-Speed 3444.11 samples/sec Loss 6.1885 LearningRate 0.0444 Epoch: 13 Global Step: 66840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:50:30,539-Speed 3442.01 samples/sec Loss 6.2818 LearningRate 0.0444 Epoch: 13 Global Step: 66850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:50:33,494-Speed 3467.26 samples/sec Loss 6.3032 LearningRate 0.0444 Epoch: 13 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:36,479-Speed 3431.04 samples/sec Loss 6.3327 LearningRate 0.0444 Epoch: 13 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:39,500-Speed 3390.31 samples/sec Loss 6.1715 LearningRate 0.0444 Epoch: 13 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:42,482-Speed 3434.79 samples/sec Loss 6.2154 LearningRate 0.0444 Epoch: 13 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:45,522-Speed 3369.04 samples/sec Loss 6.1932 LearningRate 0.0444 Epoch: 13 Global Step: 66900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:48,522-Speed 3415.15 samples/sec Loss 6.0984 LearningRate 0.0443 Epoch: 13 Global Step: 66910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:51,498-Speed 3441.87 samples/sec Loss 6.3605 LearningRate 0.0443 Epoch: 13 Global Step: 66920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:54,563-Speed 3341.17 samples/sec Loss 6.1818 LearningRate 0.0443 Epoch: 13 Global Step: 66930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:50:57,577-Speed 3399.26 samples/sec Loss 6.4396 LearningRate 0.0443 Epoch: 13 Global Step: 66940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:00,554-Speed 3439.40 samples/sec Loss 6.1951 LearningRate 0.0443 Epoch: 13 Global Step: 66950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:03,576-Speed 3390.49 samples/sec Loss 6.1217 LearningRate 0.0443 Epoch: 13 Global Step: 66960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:51:06,595-Speed 3391.86 samples/sec Loss 6.1190 LearningRate 0.0443 Epoch: 13 Global Step: 66970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:09,612-Speed 3395.32 samples/sec Loss 6.2576 LearningRate 0.0442 Epoch: 13 Global Step: 66980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:12,595-Speed 3434.19 samples/sec Loss 6.3207 LearningRate 0.0442 Epoch: 13 Global Step: 66990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:15,611-Speed 3395.77 samples/sec Loss 6.1105 LearningRate 0.0442 Epoch: 13 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:18,603-Speed 3423.59 samples/sec Loss 6.1915 LearningRate 0.0442 Epoch: 13 Global Step: 67010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:21,595-Speed 3423.22 samples/sec Loss 6.2800 LearningRate 0.0442 Epoch: 13 Global Step: 67020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:24,574-Speed 3438.40 samples/sec Loss 6.2108 LearningRate 0.0442 Epoch: 13 Global Step: 67030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:27,696-Speed 3281.16 samples/sec Loss 6.2063 LearningRate 0.0441 Epoch: 13 Global Step: 67040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:30,743-Speed 3361.05 samples/sec Loss 6.2691 LearningRate 0.0441 Epoch: 13 Global Step: 67050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:33,737-Speed 3421.28 samples/sec Loss 6.1964 LearningRate 0.0441 Epoch: 13 Global Step: 67060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:36,716-Speed 3437.80 samples/sec Loss 6.2810 LearningRate 0.0441 Epoch: 13 Global Step: 67070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:51:39,713-Speed 3418.15 samples/sec Loss 6.2220 LearningRate 0.0441 Epoch: 13 Global Step: 67080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:42,687-Speed 3443.68 samples/sec Loss 6.4319 LearningRate 0.0441 Epoch: 13 Global Step: 67090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:45,667-Speed 3437.73 samples/sec Loss 6.3033 LearningRate 0.0441 Epoch: 13 Global Step: 67100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:48,645-Speed 3439.53 samples/sec Loss 6.2580 LearningRate 0.0440 Epoch: 13 Global Step: 67110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:51,624-Speed 3438.70 samples/sec Loss 6.2081 LearningRate 0.0440 Epoch: 13 Global Step: 67120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:54,601-Speed 3440.31 samples/sec Loss 6.1927 LearningRate 0.0440 Epoch: 13 Global Step: 67130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:51:57,602-Speed 3412.69 samples/sec Loss 6.1206 LearningRate 0.0440 Epoch: 13 Global Step: 67140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:52:00,661-Speed 3348.36 samples/sec Loss 6.1774 LearningRate 0.0440 Epoch: 13 Global Step: 67150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:52:03,773-Speed 3291.57 samples/sec Loss 6.2043 LearningRate 0.0440 Epoch: 13 Global Step: 67160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:52:06,762-Speed 3426.33 samples/sec Loss 6.3122 LearningRate 0.0440 Epoch: 13 Global Step: 67170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:52:09,751-Speed 3427.36 samples/sec Loss 6.1583 LearningRate 0.0439 Epoch: 13 Global Step: 67180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:52:12,734-Speed 3434.47 samples/sec Loss 6.3446 LearningRate 0.0439 Epoch: 13 Global Step: 67190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:52:15,706-Speed 3446.46 samples/sec Loss 6.2574 LearningRate 0.0439 Epoch: 13 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:52:18,691-Speed 3431.24 samples/sec Loss 6.3319 LearningRate 0.0439 Epoch: 13 Global Step: 67210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:52:21,669-Speed 3439.61 samples/sec Loss 6.1756 LearningRate 0.0439 Epoch: 13 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:52:24,685-Speed 3395.35 samples/sec Loss 6.1728 LearningRate 0.0439 Epoch: 13 Global Step: 67230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:52:27,799-Speed 3289.63 samples/sec Loss 6.2730 LearningRate 0.0439 Epoch: 13 Global Step: 67240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:52:30,796-Speed 3417.47 samples/sec Loss 6.1129 LearningRate 0.0438 Epoch: 13 Global Step: 67250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:52:33,772-Speed 3442.31 samples/sec Loss 6.0970 LearningRate 0.0438 Epoch: 13 Global Step: 67260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:52:36,797-Speed 3385.46 samples/sec Loss 6.2858 LearningRate 0.0438 Epoch: 13 Global Step: 67270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:52:39,775-Speed 3439.68 samples/sec Loss 6.1890 LearningRate 0.0438 Epoch: 13 Global Step: 67280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:52:42,765-Speed 3426.65 samples/sec Loss 6.0696 LearningRate 0.0438 Epoch: 13 Global Step: 67290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:52:45,745-Speed 3436.18 samples/sec Loss 6.1952 LearningRate 0.0438 Epoch: 13 Global Step: 67300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:52:48,722-Speed 3441.65 samples/sec Loss 6.3457 LearningRate 0.0437 Epoch: 13 Global Step: 67310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:52:51,702-Speed 3436.88 samples/sec Loss 6.2331 LearningRate 0.0437 Epoch: 13 Global Step: 67320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:52:54,691-Speed 3426.09 samples/sec Loss 6.4044 LearningRate 0.0437 Epoch: 13 Global Step: 67330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:52:57,685-Speed 3421.31 samples/sec Loss 6.3177 LearningRate 0.0437 Epoch: 13 Global Step: 67340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:53:00,715-Speed 3380.14 samples/sec Loss 6.2800 LearningRate 0.0437 Epoch: 13 Global Step: 67350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:53:03,750-Speed 3375.37 samples/sec Loss 6.1572 LearningRate 0.0437 Epoch: 13 Global Step: 67360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:53:06,727-Speed 3440.33 samples/sec Loss 6.3800 LearningRate 0.0437 Epoch: 13 Global Step: 67370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:53:09,702-Speed 3443.46 samples/sec Loss 6.2649 LearningRate 0.0436 Epoch: 13 Global Step: 67380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:53:12,674-Speed 3446.71 samples/sec Loss 6.2986 LearningRate 0.0436 Epoch: 13 Global Step: 67390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:53:15,658-Speed 3432.30 samples/sec Loss 6.3240 LearningRate 0.0436 Epoch: 13 Global Step: 67400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:53:18,662-Speed 3409.92 samples/sec Loss 6.2006 LearningRate 0.0436 Epoch: 13 Global Step: 67410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:53:21,659-Speed 3417.34 samples/sec Loss 6.2853 LearningRate 0.0436 Epoch: 13 Global Step: 67420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:53:24,754-Speed 3309.13 samples/sec Loss 6.2540 LearningRate 0.0436 Epoch: 13 Global Step: 67430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:53:27,926-Speed 3229.37 samples/sec Loss 6.1905 LearningRate 0.0436 Epoch: 13 Global Step: 67440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:53:31,000-Speed 3331.61 samples/sec Loss 6.3026 LearningRate 0.0435 Epoch: 13 Global Step: 67450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:53:33,973-Speed 3446.70 samples/sec Loss 6.1739 LearningRate 0.0435 Epoch: 13 Global Step: 67460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:53:36,958-Speed 3430.68 samples/sec Loss 6.4714 LearningRate 0.0435 Epoch: 13 Global Step: 67470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:53:39,977-Speed 3393.47 samples/sec Loss 6.2265 LearningRate 0.0435 Epoch: 13 Global Step: 67480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:53:42,956-Speed 3437.99 samples/sec Loss 6.2185 LearningRate 0.0435 Epoch: 13 Global Step: 67490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:53:45,928-Speed 3445.96 samples/sec Loss 6.4702 LearningRate 0.0435 Epoch: 13 Global Step: 67500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:53:48,922-Speed 3422.16 samples/sec Loss 6.2060 LearningRate 0.0435 Epoch: 13 Global Step: 67510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:53:51,922-Speed 3413.39 samples/sec Loss 6.3455 LearningRate 0.0434 Epoch: 13 Global Step: 67520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:53:54,898-Speed 3441.31 samples/sec Loss 6.2189 LearningRate 0.0434 Epoch: 13 Global Step: 67530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:53:57,892-Speed 3422.24 samples/sec Loss 6.1747 LearningRate 0.0434 Epoch: 13 Global Step: 67540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:00,881-Speed 3426.05 samples/sec Loss 6.2172 LearningRate 0.0434 Epoch: 13 Global Step: 67550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:03,907-Speed 3430.55 samples/sec Loss 6.3906 LearningRate 0.0434 Epoch: 13 Global Step: 67560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:06,885-Speed 3439.05 samples/sec Loss 6.1926 LearningRate 0.0434 Epoch: 13 Global Step: 67570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:09,910-Speed 3385.78 samples/sec Loss 6.2361 LearningRate 0.0433 Epoch: 13 Global Step: 67580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:12,948-Speed 3431.47 samples/sec Loss 6.2694 LearningRate 0.0433 Epoch: 13 Global Step: 67590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:15,947-Speed 3414.46 samples/sec Loss 6.3392 LearningRate 0.0433 Epoch: 13 Global Step: 67600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:18,988-Speed 3438.06 samples/sec Loss 6.2582 LearningRate 0.0433 Epoch: 13 Global Step: 67610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:54:21,971-Speed 3433.16 samples/sec Loss 6.1971 LearningRate 0.0433 Epoch: 13 Global Step: 67620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:54:24,983-Speed 3400.23 samples/sec Loss 6.2177 LearningRate 0.0433 Epoch: 13 Global Step: 67630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:54:28,122-Speed 3364.61 samples/sec Loss 6.3853 LearningRate 0.0433 Epoch: 13 Global Step: 67640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:54:31,200-Speed 3328.10 samples/sec Loss 6.3964 LearningRate 0.0432 Epoch: 13 Global Step: 67650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:54:34,308-Speed 3295.71 samples/sec Loss 6.3065 LearningRate 0.0432 Epoch: 13 Global Step: 67660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:54:37,311-Speed 3410.64 samples/sec Loss 6.1664 LearningRate 0.0432 Epoch: 13 Global Step: 67670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:54:40,273-Speed 3457.22 samples/sec Loss 6.3159 LearningRate 0.0432 Epoch: 13 Global Step: 67680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:43,255-Speed 3435.92 samples/sec Loss 6.3751 LearningRate 0.0432 Epoch: 13 Global Step: 67690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:46,231-Speed 3440.94 samples/sec Loss 6.4058 LearningRate 0.0432 Epoch: 13 Global Step: 67700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:49,214-Speed 3434.14 samples/sec Loss 6.2260 LearningRate 0.0432 Epoch: 13 Global Step: 67710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:52,214-Speed 3414.38 samples/sec Loss 6.2507 LearningRate 0.0431 Epoch: 13 Global Step: 67720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:55,203-Speed 3426.81 samples/sec Loss 6.2320 LearningRate 0.0431 Epoch: 13 Global Step: 67730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:54:58,252-Speed 3359.19 samples/sec Loss 6.4186 LearningRate 0.0431 Epoch: 13 Global Step: 67740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:55:01,233-Speed 3436.07 samples/sec Loss 6.2862 LearningRate 0.0431 Epoch: 13 Global Step: 67750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:55:04,221-Speed 3429.00 samples/sec Loss 6.0979 LearningRate 0.0431 Epoch: 13 Global Step: 67760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:55:07,212-Speed 3423.62 samples/sec Loss 6.3110 LearningRate 0.0431 Epoch: 13 Global Step: 67770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:55:10,196-Speed 3432.78 samples/sec Loss 6.1744 LearningRate 0.0431 Epoch: 13 Global Step: 67780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:55:13,206-Speed 3402.72 samples/sec Loss 6.1885 LearningRate 0.0430 Epoch: 13 Global Step: 67790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:55:16,295-Speed 3315.43 samples/sec Loss 6.1719 LearningRate 0.0430 Epoch: 13 Global Step: 67800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:55:19,278-Speed 3434.18 samples/sec Loss 6.2788 LearningRate 0.0430 Epoch: 13 Global Step: 67810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:55:22,241-Speed 3456.76 samples/sec Loss 6.1885 LearningRate 0.0430 Epoch: 13 Global Step: 67820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:55:25,219-Speed 3439.63 samples/sec Loss 6.3078 LearningRate 0.0430 Epoch: 13 Global Step: 67830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:55:28,200-Speed 3437.01 samples/sec Loss 6.0657 LearningRate 0.0430 Epoch: 13 Global Step: 67840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:55:31,194-Speed 3420.56 samples/sec Loss 6.2596 LearningRate 0.0430 Epoch: 13 Global Step: 67850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:55:34,212-Speed 3394.12 samples/sec Loss 6.2520 LearningRate 0.0429 Epoch: 13 Global Step: 67860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:55:37,192-Speed 3437.49 samples/sec Loss 6.2498 LearningRate 0.0429 Epoch: 13 Global Step: 67870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:55:40,197-Speed 3407.57 samples/sec Loss 6.2539 LearningRate 0.0429 Epoch: 13 Global Step: 67880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:55:43,175-Speed 3440.44 samples/sec Loss 6.2591 LearningRate 0.0429 Epoch: 13 Global Step: 67890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:55:46,155-Speed 3436.21 samples/sec Loss 6.2932 LearningRate 0.0429 Epoch: 13 Global Step: 67900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:55:49,145-Speed 3426.35 samples/sec Loss 6.1785 LearningRate 0.0429 Epoch: 13 Global Step: 67910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:55:52,137-Speed 3423.32 samples/sec Loss 6.3675 LearningRate 0.0428 Epoch: 13 Global Step: 67920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:55:55,207-Speed 3336.50 samples/sec Loss 6.1778 LearningRate 0.0428 Epoch: 13 Global Step: 67930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:55:58,192-Speed 3431.89 samples/sec Loss 6.4310 LearningRate 0.0428 Epoch: 13 Global Step: 67940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:56:01,186-Speed 3420.56 samples/sec Loss 6.4647 LearningRate 0.0428 Epoch: 13 Global Step: 67950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:56:04,242-Speed 3352.26 samples/sec Loss 6.2101 LearningRate 0.0428 Epoch: 13 Global Step: 67960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:56:07,241-Speed 3415.67 samples/sec Loss 6.1977 LearningRate 0.0428 Epoch: 13 Global Step: 67970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:56:10,291-Speed 3357.28 samples/sec Loss 6.3454 LearningRate 0.0428 Epoch: 13 Global Step: 67980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:56:13,327-Speed 3373.82 samples/sec Loss 6.2920 LearningRate 0.0427 Epoch: 13 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:56:16,431-Speed 3300.29 samples/sec Loss 6.2383 LearningRate 0.0427 Epoch: 13 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:56:59,628-[lfw][68000]XNorm: 22.923274 Training: 2022-01-19 23:56:59,628-[lfw][68000]Accuracy-Flip: 0.99750+-0.00271 Training: 2022-01-19 23:56:59,629-[lfw][68000]Accuracy-Highest: 0.99767 Training: 2022-01-19 23:57:49,833-[cfp_fp][68000]XNorm: 19.782137 Training: 2022-01-19 23:57:49,834-[cfp_fp][68000]Accuracy-Flip: 0.97486+-0.00780 Training: 2022-01-19 23:57:49,835-[cfp_fp][68000]Accuracy-Highest: 0.97486 Training: 2022-01-19 23:58:33,064-[agedb_30][68000]XNorm: 22.471789 Training: 2022-01-19 23:58:33,065-[agedb_30][68000]Accuracy-Flip: 0.97733+-0.00688 Training: 2022-01-19 23:58:33,066-[agedb_30][68000]Accuracy-Highest: 0.97800 Training: 2022-01-19 23:58:36,093-Speed 73.32 samples/sec Loss 6.2692 LearningRate 0.0427 Epoch: 13 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:58:39,109-Speed 3395.80 samples/sec Loss 6.2955 LearningRate 0.0427 Epoch: 13 Global Step: 68020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:58:42,134-Speed 3385.35 samples/sec Loss 6.1199 LearningRate 0.0427 Epoch: 13 Global Step: 68030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:58:45,144-Speed 3403.84 samples/sec Loss 6.2270 LearningRate 0.0427 Epoch: 13 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:58:48,144-Speed 3413.63 samples/sec Loss 6.4189 LearningRate 0.0427 Epoch: 13 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:58:51,126-Speed 3435.66 samples/sec Loss 6.4641 LearningRate 0.0426 Epoch: 13 Global Step: 68060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:58:54,109-Speed 3433.72 samples/sec Loss 6.2841 LearningRate 0.0426 Epoch: 13 Global Step: 68070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:58:57,088-Speed 3438.42 samples/sec Loss 6.2830 LearningRate 0.0426 Epoch: 13 Global Step: 68080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:59:00,081-Speed 3422.03 samples/sec Loss 6.2593 LearningRate 0.0426 Epoch: 13 Global Step: 68090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:59:03,063-Speed 3434.78 samples/sec Loss 6.1829 LearningRate 0.0426 Epoch: 13 Global Step: 68100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:59:06,078-Speed 3397.97 samples/sec Loss 6.2250 LearningRate 0.0426 Epoch: 13 Global Step: 68110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:59:09,057-Speed 3437.98 samples/sec Loss 6.3778 LearningRate 0.0426 Epoch: 13 Global Step: 68120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:59:12,031-Speed 3443.90 samples/sec Loss 6.1852 LearningRate 0.0425 Epoch: 13 Global Step: 68130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:59:15,159-Speed 3275.16 samples/sec Loss 6.1874 LearningRate 0.0425 Epoch: 13 Global Step: 68140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:59:18,229-Speed 3335.92 samples/sec Loss 6.1773 LearningRate 0.0425 Epoch: 13 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:59:21,218-Speed 3427.30 samples/sec Loss 6.1873 LearningRate 0.0425 Epoch: 13 Global Step: 68160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:59:24,200-Speed 3434.22 samples/sec Loss 6.2862 LearningRate 0.0425 Epoch: 13 Global Step: 68170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-19 23:59:27,164-Speed 3456.02 samples/sec Loss 6.3209 LearningRate 0.0425 Epoch: 13 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:59:30,162-Speed 3417.36 samples/sec Loss 6.1774 LearningRate 0.0425 Epoch: 13 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-19 23:59:33,124-Speed 3457.00 samples/sec Loss 6.2324 LearningRate 0.0424 Epoch: 13 Global Step: 68200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:59:36,107-Speed 3434.11 samples/sec Loss 6.0744 LearningRate 0.0424 Epoch: 13 Global Step: 68210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:59:39,080-Speed 3445.27 samples/sec Loss 6.1921 LearningRate 0.0424 Epoch: 13 Global Step: 68220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:59:42,065-Speed 3431.72 samples/sec Loss 6.1605 LearningRate 0.0424 Epoch: 13 Global Step: 68230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:59:45,047-Speed 3435.52 samples/sec Loss 6.2573 LearningRate 0.0424 Epoch: 13 Global Step: 68240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:59:48,069-Speed 3388.50 samples/sec Loss 6.1794 LearningRate 0.0424 Epoch: 13 Global Step: 68250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:59:51,145-Speed 3330.34 samples/sec Loss 6.2059 LearningRate 0.0424 Epoch: 13 Global Step: 68260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:59:54,138-Speed 3421.94 samples/sec Loss 6.3383 LearningRate 0.0423 Epoch: 13 Global Step: 68270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-19 23:59:57,123-Speed 3431.25 samples/sec Loss 6.0820 LearningRate 0.0423 Epoch: 13 Global Step: 68280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:00:00,103-Speed 3437.49 samples/sec Loss 6.2998 LearningRate 0.0423 Epoch: 13 Global Step: 68290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:00:03,086-Speed 3434.02 samples/sec Loss 6.2542 LearningRate 0.0423 Epoch: 13 Global Step: 68300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:06,072-Speed 3431.09 samples/sec Loss 6.1834 LearningRate 0.0423 Epoch: 13 Global Step: 68310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:09,075-Speed 3411.31 samples/sec Loss 6.3890 LearningRate 0.0423 Epoch: 13 Global Step: 68320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:12,119-Speed 3364.17 samples/sec Loss 6.1097 LearningRate 0.0423 Epoch: 13 Global Step: 68330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:15,096-Speed 3441.06 samples/sec Loss 6.2550 LearningRate 0.0422 Epoch: 13 Global Step: 68340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:18,099-Speed 3410.75 samples/sec Loss 6.2161 LearningRate 0.0422 Epoch: 13 Global Step: 68350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:21,090-Speed 3425.17 samples/sec Loss 6.1704 LearningRate 0.0422 Epoch: 13 Global Step: 68360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:24,063-Speed 3444.23 samples/sec Loss 6.2500 LearningRate 0.0422 Epoch: 13 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:27,040-Speed 3440.73 samples/sec Loss 6.1939 LearningRate 0.0422 Epoch: 13 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:30,046-Speed 3407.65 samples/sec Loss 6.2175 LearningRate 0.0422 Epoch: 13 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:33,044-Speed 3416.90 samples/sec Loss 6.3028 LearningRate 0.0421 Epoch: 13 Global Step: 68400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:00:36,139-Speed 3309.83 samples/sec Loss 6.0825 LearningRate 0.0421 Epoch: 13 Global Step: 68410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:39,140-Speed 3413.37 samples/sec Loss 6.2070 LearningRate 0.0421 Epoch: 13 Global Step: 68420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:42,124-Speed 3431.28 samples/sec Loss 6.2375 LearningRate 0.0421 Epoch: 13 Global Step: 68430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:45,112-Speed 3429.28 samples/sec Loss 6.2479 LearningRate 0.0421 Epoch: 13 Global Step: 68440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:48,087-Speed 3442.70 samples/sec Loss 6.2649 LearningRate 0.0421 Epoch: 13 Global Step: 68450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:51,187-Speed 3303.86 samples/sec Loss 6.3011 LearningRate 0.0421 Epoch: 13 Global Step: 68460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:54,344-Speed 3244.81 samples/sec Loss 6.1772 LearningRate 0.0420 Epoch: 13 Global Step: 68470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:00:57,349-Speed 3408.62 samples/sec Loss 6.0424 LearningRate 0.0420 Epoch: 13 Global Step: 68480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:00,328-Speed 3439.28 samples/sec Loss 6.1257 LearningRate 0.0420 Epoch: 13 Global Step: 68490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:03,324-Speed 3418.50 samples/sec Loss 6.2711 LearningRate 0.0420 Epoch: 13 Global Step: 68500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:06,366-Speed 3367.12 samples/sec Loss 6.1600 LearningRate 0.0420 Epoch: 13 Global Step: 68510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:01:09,387-Speed 3390.24 samples/sec Loss 6.1677 LearningRate 0.0420 Epoch: 13 Global Step: 68520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:01:12,461-Speed 3332.44 samples/sec Loss 6.0480 LearningRate 0.0420 Epoch: 13 Global Step: 68530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:01:15,472-Speed 3402.15 samples/sec Loss 5.9876 LearningRate 0.0419 Epoch: 13 Global Step: 68540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:01:18,446-Speed 3443.88 samples/sec Loss 6.2186 LearningRate 0.0419 Epoch: 13 Global Step: 68550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:01:21,403-Speed 3463.07 samples/sec Loss 6.1363 LearningRate 0.0419 Epoch: 13 Global Step: 68560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:24,379-Speed 3441.84 samples/sec Loss 6.1946 LearningRate 0.0419 Epoch: 13 Global Step: 68570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:27,358-Speed 3438.35 samples/sec Loss 6.1525 LearningRate 0.0419 Epoch: 13 Global Step: 68580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:30,355-Speed 3418.22 samples/sec Loss 6.2245 LearningRate 0.0419 Epoch: 13 Global Step: 68590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:33,355-Speed 3414.63 samples/sec Loss 6.1304 LearningRate 0.0419 Epoch: 13 Global Step: 68600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:36,347-Speed 3422.54 samples/sec Loss 6.3058 LearningRate 0.0418 Epoch: 13 Global Step: 68610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:39,396-Speed 3360.44 samples/sec Loss 6.2448 LearningRate 0.0418 Epoch: 13 Global Step: 68620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:42,406-Speed 3401.89 samples/sec Loss 6.2234 LearningRate 0.0418 Epoch: 13 Global Step: 68630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:45,390-Speed 3434.00 samples/sec Loss 6.2506 LearningRate 0.0418 Epoch: 13 Global Step: 68640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:48,416-Speed 3384.76 samples/sec Loss 6.2553 LearningRate 0.0418 Epoch: 13 Global Step: 68650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:51,393-Speed 3441.36 samples/sec Loss 6.1187 LearningRate 0.0418 Epoch: 13 Global Step: 68660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:01:54,355-Speed 3458.48 samples/sec Loss 6.4048 LearningRate 0.0418 Epoch: 13 Global Step: 68670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:01:57,347-Speed 3423.73 samples/sec Loss 6.2657 LearningRate 0.0417 Epoch: 13 Global Step: 68680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:00,324-Speed 3440.78 samples/sec Loss 6.2677 LearningRate 0.0417 Epoch: 13 Global Step: 68690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:03,301-Speed 3440.10 samples/sec Loss 6.3251 LearningRate 0.0417 Epoch: 13 Global Step: 68700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:06,298-Speed 3417.76 samples/sec Loss 6.2816 LearningRate 0.0417 Epoch: 13 Global Step: 68710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:09,291-Speed 3422.27 samples/sec Loss 6.3959 LearningRate 0.0417 Epoch: 13 Global Step: 68720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:12,309-Speed 3394.11 samples/sec Loss 6.1091 LearningRate 0.0417 Epoch: 13 Global Step: 68730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:15,302-Speed 3422.59 samples/sec Loss 6.2140 LearningRate 0.0417 Epoch: 13 Global Step: 68740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:18,296-Speed 3421.22 samples/sec Loss 6.1445 LearningRate 0.0416 Epoch: 13 Global Step: 68750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:21,313-Speed 3394.79 samples/sec Loss 6.3400 LearningRate 0.0416 Epoch: 13 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:24,294-Speed 3436.15 samples/sec Loss 6.2519 LearningRate 0.0416 Epoch: 13 Global Step: 68770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:02:27,268-Speed 3445.09 samples/sec Loss 6.1899 LearningRate 0.0416 Epoch: 13 Global Step: 68780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:30,247-Speed 3437.60 samples/sec Loss 6.1499 LearningRate 0.0416 Epoch: 13 Global Step: 68790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:33,226-Speed 3438.48 samples/sec Loss 6.1304 LearningRate 0.0416 Epoch: 13 Global Step: 68800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:36,224-Speed 3417.15 samples/sec Loss 6.3360 LearningRate 0.0416 Epoch: 13 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:39,242-Speed 3393.39 samples/sec Loss 6.1951 LearningRate 0.0415 Epoch: 13 Global Step: 68820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:42,270-Speed 3382.26 samples/sec Loss 6.1960 LearningRate 0.0415 Epoch: 13 Global Step: 68830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:45,254-Speed 3433.42 samples/sec Loss 6.1694 LearningRate 0.0415 Epoch: 13 Global Step: 68840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:48,231-Speed 3440.29 samples/sec Loss 6.1632 LearningRate 0.0415 Epoch: 13 Global Step: 68850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:51,211-Speed 3437.25 samples/sec Loss 6.1860 LearningRate 0.0415 Epoch: 13 Global Step: 68860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:54,191-Speed 3436.92 samples/sec Loss 6.2514 LearningRate 0.0415 Epoch: 13 Global Step: 68870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:02:57,178-Speed 3429.50 samples/sec Loss 6.1930 LearningRate 0.0415 Epoch: 13 Global Step: 68880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:03:00,218-Speed 3369.35 samples/sec Loss 6.1451 LearningRate 0.0414 Epoch: 13 Global Step: 68890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:03:03,211-Speed 3422.11 samples/sec Loss 6.2258 LearningRate 0.0414 Epoch: 13 Global Step: 68900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:03:06,184-Speed 3446.29 samples/sec Loss 6.2282 LearningRate 0.0414 Epoch: 13 Global Step: 68910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:09,174-Speed 3424.90 samples/sec Loss 6.1807 LearningRate 0.0414 Epoch: 13 Global Step: 68920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:12,166-Speed 3423.64 samples/sec Loss 6.1553 LearningRate 0.0414 Epoch: 13 Global Step: 68930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:15,151-Speed 3431.13 samples/sec Loss 6.1627 LearningRate 0.0414 Epoch: 13 Global Step: 68940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:18,133-Speed 3434.96 samples/sec Loss 6.1734 LearningRate 0.0414 Epoch: 13 Global Step: 68950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:21,110-Speed 3440.21 samples/sec Loss 6.3909 LearningRate 0.0413 Epoch: 13 Global Step: 68960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:24,088-Speed 3439.52 samples/sec Loss 6.1051 LearningRate 0.0413 Epoch: 13 Global Step: 68970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:27,075-Speed 3428.56 samples/sec Loss 6.2532 LearningRate 0.0413 Epoch: 13 Global Step: 68980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:30,074-Speed 3416.72 samples/sec Loss 6.2961 LearningRate 0.0413 Epoch: 13 Global Step: 68990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:33,068-Speed 3420.46 samples/sec Loss 6.2739 LearningRate 0.0413 Epoch: 13 Global Step: 69000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:36,109-Speed 3369.00 samples/sec Loss 6.2850 LearningRate 0.0413 Epoch: 13 Global Step: 69010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:03:39,133-Speed 3386.17 samples/sec Loss 6.2438 LearningRate 0.0413 Epoch: 13 Global Step: 69020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:03:42,097-Speed 3456.15 samples/sec Loss 6.2364 LearningRate 0.0412 Epoch: 13 Global Step: 69030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:45,076-Speed 3437.90 samples/sec Loss 6.3103 LearningRate 0.0412 Epoch: 13 Global Step: 69040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:48,055-Speed 3439.01 samples/sec Loss 6.1742 LearningRate 0.0412 Epoch: 13 Global Step: 69050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:51,038-Speed 3433.79 samples/sec Loss 6.2675 LearningRate 0.0412 Epoch: 13 Global Step: 69060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:54,054-Speed 3395.72 samples/sec Loss 6.0848 LearningRate 0.0412 Epoch: 13 Global Step: 69070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:03:57,036-Speed 3434.57 samples/sec Loss 6.2591 LearningRate 0.0412 Epoch: 13 Global Step: 69080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:00,274-Speed 3163.88 samples/sec Loss 6.0325 LearningRate 0.0412 Epoch: 13 Global Step: 69090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:03,365-Speed 3313.71 samples/sec Loss 6.0605 LearningRate 0.0411 Epoch: 13 Global Step: 69100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:06,360-Speed 3419.76 samples/sec Loss 6.2453 LearningRate 0.0411 Epoch: 13 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:09,421-Speed 3346.26 samples/sec Loss 6.2318 LearningRate 0.0411 Epoch: 13 Global Step: 69120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:12,451-Speed 3379.69 samples/sec Loss 6.1625 LearningRate 0.0411 Epoch: 13 Global Step: 69130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:04:15,522-Speed 3335.99 samples/sec Loss 6.1808 LearningRate 0.0411 Epoch: 13 Global Step: 69140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:04:18,549-Speed 3383.37 samples/sec Loss 6.2719 LearningRate 0.0411 Epoch: 13 Global Step: 69150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:04:21,528-Speed 3439.18 samples/sec Loss 6.0258 LearningRate 0.0411 Epoch: 13 Global Step: 69160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:04:24,511-Speed 3434.03 samples/sec Loss 6.0666 LearningRate 0.0410 Epoch: 13 Global Step: 69170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:04:27,557-Speed 3362.23 samples/sec Loss 6.2594 LearningRate 0.0410 Epoch: 13 Global Step: 69180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:04:30,560-Speed 3410.53 samples/sec Loss 6.1842 LearningRate 0.0410 Epoch: 13 Global Step: 69190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:04:33,555-Speed 3419.64 samples/sec Loss 6.2199 LearningRate 0.0410 Epoch: 13 Global Step: 69200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:36,548-Speed 3422.94 samples/sec Loss 6.0896 LearningRate 0.0410 Epoch: 13 Global Step: 69210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:39,610-Speed 3345.36 samples/sec Loss 6.0304 LearningRate 0.0410 Epoch: 13 Global Step: 69220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:42,598-Speed 3426.98 samples/sec Loss 6.2929 LearningRate 0.0410 Epoch: 13 Global Step: 69230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:45,584-Speed 3430.77 samples/sec Loss 6.2520 LearningRate 0.0409 Epoch: 13 Global Step: 69240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:48,600-Speed 3395.93 samples/sec Loss 6.1193 LearningRate 0.0409 Epoch: 13 Global Step: 69250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:51,584-Speed 3433.24 samples/sec Loss 6.2233 LearningRate 0.0409 Epoch: 13 Global Step: 69260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:54,587-Speed 3411.00 samples/sec Loss 6.1110 LearningRate 0.0409 Epoch: 13 Global Step: 69270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:04:57,586-Speed 3414.91 samples/sec Loss 6.2568 LearningRate 0.0409 Epoch: 13 Global Step: 69280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:05:00,565-Speed 3438.59 samples/sec Loss 6.1957 LearningRate 0.0409 Epoch: 13 Global Step: 69290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:05:03,552-Speed 3428.33 samples/sec Loss 6.1034 LearningRate 0.0409 Epoch: 13 Global Step: 69300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:05:06,558-Speed 3408.00 samples/sec Loss 6.1509 LearningRate 0.0408 Epoch: 13 Global Step: 69310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:05:09,523-Speed 3454.41 samples/sec Loss 6.2408 LearningRate 0.0408 Epoch: 13 Global Step: 69320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:05:12,504-Speed 3435.53 samples/sec Loss 6.1149 LearningRate 0.0408 Epoch: 13 Global Step: 69330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:05:15,485-Speed 3436.46 samples/sec Loss 6.2047 LearningRate 0.0408 Epoch: 13 Global Step: 69340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:05:18,472-Speed 3429.82 samples/sec Loss 6.2242 LearningRate 0.0408 Epoch: 13 Global Step: 69350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:05:21,450-Speed 3439.68 samples/sec Loss 6.1737 LearningRate 0.0408 Epoch: 13 Global Step: 69360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:05:24,433-Speed 3433.43 samples/sec Loss 6.2968 LearningRate 0.0408 Epoch: 13 Global Step: 69370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:05:27,430-Speed 3417.76 samples/sec Loss 6.1686 LearningRate 0.0407 Epoch: 13 Global Step: 69380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:05:30,456-Speed 3384.91 samples/sec Loss 6.1810 LearningRate 0.0407 Epoch: 13 Global Step: 69390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:05:33,425-Speed 3449.16 samples/sec Loss 6.2344 LearningRate 0.0407 Epoch: 13 Global Step: 69400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:05:36,431-Speed 3408.32 samples/sec Loss 6.2360 LearningRate 0.0407 Epoch: 13 Global Step: 69410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:05:39,415-Speed 3432.44 samples/sec Loss 6.1984 LearningRate 0.0407 Epoch: 13 Global Step: 69420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:05:42,410-Speed 3419.03 samples/sec Loss 6.0964 LearningRate 0.0407 Epoch: 13 Global Step: 69430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:05:45,423-Speed 3400.31 samples/sec Loss 6.1868 LearningRate 0.0407 Epoch: 13 Global Step: 69440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:05:48,555-Speed 3269.98 samples/sec Loss 6.0480 LearningRate 0.0406 Epoch: 13 Global Step: 69450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:05:51,567-Speed 3401.31 samples/sec Loss 6.3216 LearningRate 0.0406 Epoch: 13 Global Step: 69460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:05:54,639-Speed 3334.34 samples/sec Loss 6.0584 LearningRate 0.0406 Epoch: 13 Global Step: 69470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:05:57,650-Speed 3401.07 samples/sec Loss 6.1486 LearningRate 0.0406 Epoch: 13 Global Step: 69480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:06:00,639-Speed 3427.55 samples/sec Loss 6.2181 LearningRate 0.0406 Epoch: 13 Global Step: 69490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:06:03,631-Speed 3422.98 samples/sec Loss 6.0517 LearningRate 0.0406 Epoch: 13 Global Step: 69500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:06,621-Speed 3425.21 samples/sec Loss 6.1321 LearningRate 0.0406 Epoch: 13 Global Step: 69510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:09,603-Speed 3435.09 samples/sec Loss 6.1924 LearningRate 0.0405 Epoch: 13 Global Step: 69520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:12,617-Speed 3397.85 samples/sec Loss 6.0119 LearningRate 0.0405 Epoch: 13 Global Step: 69530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:15,687-Speed 3337.34 samples/sec Loss 6.1079 LearningRate 0.0405 Epoch: 13 Global Step: 69540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:18,716-Speed 3381.11 samples/sec Loss 6.0791 LearningRate 0.0405 Epoch: 13 Global Step: 69550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:21,784-Speed 3338.93 samples/sec Loss 6.0508 LearningRate 0.0405 Epoch: 13 Global Step: 69560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:24,771-Speed 3429.02 samples/sec Loss 6.2445 LearningRate 0.0405 Epoch: 13 Global Step: 69570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:27,760-Speed 3427.24 samples/sec Loss 6.1381 LearningRate 0.0405 Epoch: 13 Global Step: 69580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:30,741-Speed 3435.45 samples/sec Loss 6.2530 LearningRate 0.0404 Epoch: 13 Global Step: 69590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:33,714-Speed 3445.78 samples/sec Loss 6.0587 LearningRate 0.0404 Epoch: 13 Global Step: 69600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:36,699-Speed 3430.92 samples/sec Loss 6.3043 LearningRate 0.0404 Epoch: 13 Global Step: 69610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:39,691-Speed 3423.76 samples/sec Loss 6.1452 LearningRate 0.0404 Epoch: 13 Global Step: 69620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:42,672-Speed 3435.62 samples/sec Loss 6.1211 LearningRate 0.0404 Epoch: 13 Global Step: 69630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:45,651-Speed 3439.25 samples/sec Loss 6.0788 LearningRate 0.0404 Epoch: 13 Global Step: 69640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:48,733-Speed 3323.36 samples/sec Loss 6.1053 LearningRate 0.0404 Epoch: 13 Global Step: 69650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:51,720-Speed 3428.96 samples/sec Loss 6.2366 LearningRate 0.0403 Epoch: 13 Global Step: 69660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:54,712-Speed 3423.44 samples/sec Loss 6.2855 LearningRate 0.0403 Epoch: 13 Global Step: 69670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:06:57,724-Speed 3399.90 samples/sec Loss 5.9876 LearningRate 0.0403 Epoch: 13 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:00,742-Speed 3394.14 samples/sec Loss 6.1439 LearningRate 0.0403 Epoch: 13 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:03,720-Speed 3439.13 samples/sec Loss 6.0516 LearningRate 0.0403 Epoch: 13 Global Step: 69700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:07:06,702-Speed 3435.64 samples/sec Loss 6.0855 LearningRate 0.0403 Epoch: 13 Global Step: 69710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:07:09,666-Speed 3455.73 samples/sec Loss 6.1848 LearningRate 0.0403 Epoch: 13 Global Step: 69720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:12,646-Speed 3437.02 samples/sec Loss 5.9321 LearningRate 0.0402 Epoch: 13 Global Step: 69730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:15,645-Speed 3415.50 samples/sec Loss 6.1989 LearningRate 0.0402 Epoch: 13 Global Step: 69740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:18,630-Speed 3431.45 samples/sec Loss 6.0989 LearningRate 0.0402 Epoch: 13 Global Step: 69750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:21,624-Speed 3421.73 samples/sec Loss 6.3188 LearningRate 0.0402 Epoch: 13 Global Step: 69760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:24,647-Speed 3387.53 samples/sec Loss 6.1749 LearningRate 0.0402 Epoch: 13 Global Step: 69770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:27,742-Speed 3309.73 samples/sec Loss 5.9903 LearningRate 0.0402 Epoch: 13 Global Step: 69780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:30,731-Speed 3427.60 samples/sec Loss 5.9363 LearningRate 0.0402 Epoch: 13 Global Step: 69790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:33,718-Speed 3428.04 samples/sec Loss 6.0789 LearningRate 0.0401 Epoch: 13 Global Step: 69800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:36,734-Speed 3397.12 samples/sec Loss 6.1958 LearningRate 0.0401 Epoch: 13 Global Step: 69810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:39,726-Speed 3423.00 samples/sec Loss 5.9522 LearningRate 0.0401 Epoch: 13 Global Step: 69820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:07:42,699-Speed 3445.67 samples/sec Loss 6.1525 LearningRate 0.0401 Epoch: 13 Global Step: 69830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:45,713-Speed 3398.07 samples/sec Loss 5.9073 LearningRate 0.0401 Epoch: 13 Global Step: 69840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:48,777-Speed 3343.49 samples/sec Loss 6.2231 LearningRate 0.0401 Epoch: 13 Global Step: 69850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:51,878-Speed 3303.03 samples/sec Loss 6.1592 LearningRate 0.0401 Epoch: 13 Global Step: 69860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:54,967-Speed 3315.59 samples/sec Loss 6.1799 LearningRate 0.0400 Epoch: 13 Global Step: 69870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:07:57,961-Speed 3421.60 samples/sec Loss 6.1655 LearningRate 0.0400 Epoch: 13 Global Step: 69880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:08:00,957-Speed 3418.94 samples/sec Loss 6.1321 LearningRate 0.0400 Epoch: 13 Global Step: 69890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:08:04,012-Speed 3352.67 samples/sec Loss 6.0479 LearningRate 0.0400 Epoch: 13 Global Step: 69900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:08:07,052-Speed 3369.11 samples/sec Loss 5.9686 LearningRate 0.0400 Epoch: 13 Global Step: 69910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:08:10,070-Speed 3394.47 samples/sec Loss 6.0282 LearningRate 0.0400 Epoch: 13 Global Step: 69920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:08:13,069-Speed 3415.44 samples/sec Loss 6.0527 LearningRate 0.0400 Epoch: 13 Global Step: 69930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:08:16,058-Speed 3426.84 samples/sec Loss 5.9664 LearningRate 0.0399 Epoch: 13 Global Step: 69940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:08:19,053-Speed 3419.89 samples/sec Loss 6.1349 LearningRate 0.0399 Epoch: 13 Global Step: 69950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:08:22,033-Speed 3436.46 samples/sec Loss 6.1169 LearningRate 0.0399 Epoch: 13 Global Step: 69960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:08:25,016-Speed 3433.97 samples/sec Loss 6.1091 LearningRate 0.0399 Epoch: 13 Global Step: 69970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:08:28,055-Speed 3370.43 samples/sec Loss 6.0788 LearningRate 0.0399 Epoch: 13 Global Step: 69980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:08:31,034-Speed 3438.85 samples/sec Loss 6.1266 LearningRate 0.0399 Epoch: 13 Global Step: 69990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:08:34,060-Speed 3385.36 samples/sec Loss 6.0770 LearningRate 0.0399 Epoch: 13 Global Step: 70000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:09:17,004-[lfw][70000]XNorm: 22.713133 Training: 2022-01-20 00:09:17,005-[lfw][70000]Accuracy-Flip: 0.99717+-0.00236 Training: 2022-01-20 00:09:17,005-[lfw][70000]Accuracy-Highest: 0.99767 Training: 2022-01-20 00:10:07,132-[cfp_fp][70000]XNorm: 20.070130 Training: 2022-01-20 00:10:07,133-[cfp_fp][70000]Accuracy-Flip: 0.97543+-0.00638 Training: 2022-01-20 00:10:07,133-[cfp_fp][70000]Accuracy-Highest: 0.97543 Training: 2022-01-20 00:10:50,299-[agedb_30][70000]XNorm: 22.290146 Training: 2022-01-20 00:10:50,299-[agedb_30][70000]Accuracy-Flip: 0.97517+-0.00751 Training: 2022-01-20 00:10:50,300-[agedb_30][70000]Accuracy-Highest: 0.97800 Training: 2022-01-20 00:10:53,372-Speed 73.50 samples/sec Loss 6.0527 LearningRate 0.0398 Epoch: 13 Global Step: 70010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:10:56,335-Speed 3456.59 samples/sec Loss 6.0629 LearningRate 0.0398 Epoch: 13 Global Step: 70020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:10:59,309-Speed 3443.44 samples/sec Loss 6.0892 LearningRate 0.0398 Epoch: 13 Global Step: 70030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:11:02,300-Speed 3425.68 samples/sec Loss 5.9864 LearningRate 0.0398 Epoch: 13 Global Step: 70040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:11:05,259-Speed 3460.72 samples/sec Loss 5.9746 LearningRate 0.0398 Epoch: 13 Global Step: 70050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:11:08,242-Speed 3434.06 samples/sec Loss 6.1115 LearningRate 0.0398 Epoch: 13 Global Step: 70060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:11:11,380-Speed 3264.14 samples/sec Loss 6.1436 LearningRate 0.0398 Epoch: 13 Global Step: 70070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:11:14,536-Speed 3245.42 samples/sec Loss 5.8996 LearningRate 0.0397 Epoch: 13 Global Step: 70080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:11:17,550-Speed 3398.08 samples/sec Loss 6.2089 LearningRate 0.0397 Epoch: 13 Global Step: 70090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:11:20,608-Speed 3349.55 samples/sec Loss 6.2075 LearningRate 0.0397 Epoch: 13 Global Step: 70100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:11:23,588-Speed 3436.62 samples/sec Loss 6.0866 LearningRate 0.0397 Epoch: 13 Global Step: 70110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:11:26,773-Speed 3216.80 samples/sec Loss 6.0589 LearningRate 0.0397 Epoch: 13 Global Step: 70120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:11:29,806-Speed 3376.96 samples/sec Loss 6.0282 LearningRate 0.0397 Epoch: 13 Global Step: 70130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:11:32,787-Speed 3436.62 samples/sec Loss 5.9047 LearningRate 0.0397 Epoch: 13 Global Step: 70140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:11:35,763-Speed 3441.09 samples/sec Loss 6.1893 LearningRate 0.0396 Epoch: 13 Global Step: 70150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:11:38,824-Speed 3346.07 samples/sec Loss 5.9776 LearningRate 0.0396 Epoch: 13 Global Step: 70160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:11:41,822-Speed 3417.20 samples/sec Loss 6.2185 LearningRate 0.0396 Epoch: 13 Global Step: 70170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:11:44,801-Speed 3439.13 samples/sec Loss 6.0229 LearningRate 0.0396 Epoch: 13 Global Step: 70180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:11:47,813-Speed 3400.12 samples/sec Loss 6.1595 LearningRate 0.0396 Epoch: 13 Global Step: 70190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:11:50,847-Speed 3375.61 samples/sec Loss 6.0668 LearningRate 0.0396 Epoch: 13 Global Step: 70200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:11:53,935-Speed 3317.07 samples/sec Loss 6.0246 LearningRate 0.0396 Epoch: 13 Global Step: 70210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:11:56,986-Speed 3357.89 samples/sec Loss 6.1773 LearningRate 0.0395 Epoch: 13 Global Step: 70220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:11:59,973-Speed 3428.62 samples/sec Loss 6.0316 LearningRate 0.0395 Epoch: 13 Global Step: 70230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:12:02,979-Speed 3408.33 samples/sec Loss 6.1846 LearningRate 0.0395 Epoch: 13 Global Step: 70240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:12:05,957-Speed 3439.29 samples/sec Loss 6.1306 LearningRate 0.0395 Epoch: 13 Global Step: 70250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:12:08,942-Speed 3430.94 samples/sec Loss 6.1120 LearningRate 0.0395 Epoch: 13 Global Step: 70260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:12:11,924-Speed 3435.12 samples/sec Loss 6.1130 LearningRate 0.0395 Epoch: 13 Global Step: 70270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:12:14,913-Speed 3427.01 samples/sec Loss 6.1307 LearningRate 0.0395 Epoch: 13 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:12:17,898-Speed 3431.06 samples/sec Loss 5.9113 LearningRate 0.0394 Epoch: 13 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:12:20,887-Speed 3427.17 samples/sec Loss 5.8929 LearningRate 0.0394 Epoch: 13 Global Step: 70300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:12:23,918-Speed 3379.56 samples/sec Loss 6.0711 LearningRate 0.0394 Epoch: 13 Global Step: 70310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:12:26,936-Speed 3394.10 samples/sec Loss 6.1191 LearningRate 0.0394 Epoch: 13 Global Step: 70320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:12:29,943-Speed 3405.43 samples/sec Loss 5.9483 LearningRate 0.0394 Epoch: 13 Global Step: 70330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:12:32,914-Speed 3448.01 samples/sec Loss 6.0010 LearningRate 0.0394 Epoch: 13 Global Step: 70340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:12:35,903-Speed 3426.43 samples/sec Loss 6.1169 LearningRate 0.0394 Epoch: 13 Global Step: 70350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:12:38,877-Speed 3444.65 samples/sec Loss 6.1474 LearningRate 0.0394 Epoch: 13 Global Step: 70360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:12:41,884-Speed 3406.32 samples/sec Loss 5.9675 LearningRate 0.0393 Epoch: 13 Global Step: 70370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:12:44,894-Speed 3402.60 samples/sec Loss 6.0437 LearningRate 0.0393 Epoch: 13 Global Step: 70380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:12:47,867-Speed 3446.46 samples/sec Loss 6.0397 LearningRate 0.0393 Epoch: 13 Global Step: 70390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:12:50,842-Speed 3442.69 samples/sec Loss 6.0127 LearningRate 0.0393 Epoch: 13 Global Step: 70400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:12:53,859-Speed 3394.57 samples/sec Loss 6.1390 LearningRate 0.0393 Epoch: 13 Global Step: 70410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:12:56,855-Speed 3419.28 samples/sec Loss 6.1016 LearningRate 0.0393 Epoch: 13 Global Step: 70420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:13:00,035-Speed 3220.45 samples/sec Loss 6.1474 LearningRate 0.0393 Epoch: 13 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:03,040-Speed 3408.84 samples/sec Loss 5.8420 LearningRate 0.0392 Epoch: 13 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:06,043-Speed 3410.41 samples/sec Loss 5.9881 LearningRate 0.0392 Epoch: 13 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:09,128-Speed 3320.51 samples/sec Loss 6.0565 LearningRate 0.0392 Epoch: 13 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:12,105-Speed 3440.88 samples/sec Loss 6.1661 LearningRate 0.0392 Epoch: 13 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:15,090-Speed 3431.87 samples/sec Loss 5.8184 LearningRate 0.0392 Epoch: 13 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:18,068-Speed 3440.76 samples/sec Loss 5.9077 LearningRate 0.0392 Epoch: 13 Global Step: 70490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:21,055-Speed 3430.23 samples/sec Loss 6.0180 LearningRate 0.0392 Epoch: 13 Global Step: 70500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:24,031-Speed 3440.84 samples/sec Loss 6.1028 LearningRate 0.0391 Epoch: 13 Global Step: 70510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:28,414-Speed 2336.86 samples/sec Loss 6.0821 LearningRate 0.0391 Epoch: 13 Global Step: 70520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:31,694-Speed 3122.50 samples/sec Loss 5.8601 LearningRate 0.0391 Epoch: 13 Global Step: 70530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:13:34,786-Speed 3312.54 samples/sec Loss 6.1124 LearningRate 0.0391 Epoch: 13 Global Step: 70540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:13:37,796-Speed 3403.03 samples/sec Loss 6.0069 LearningRate 0.0391 Epoch: 13 Global Step: 70550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:13:40,786-Speed 3425.03 samples/sec Loss 6.0477 LearningRate 0.0391 Epoch: 13 Global Step: 70560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:43,785-Speed 3416.60 samples/sec Loss 6.1826 LearningRate 0.0391 Epoch: 13 Global Step: 70570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:46,783-Speed 3416.74 samples/sec Loss 5.9857 LearningRate 0.0390 Epoch: 13 Global Step: 70580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:49,808-Speed 3385.31 samples/sec Loss 5.9714 LearningRate 0.0390 Epoch: 13 Global Step: 70590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:52,833-Speed 3386.72 samples/sec Loss 6.1981 LearningRate 0.0390 Epoch: 13 Global Step: 70600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:55,831-Speed 3415.74 samples/sec Loss 6.1177 LearningRate 0.0390 Epoch: 13 Global Step: 70610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:13:58,813-Speed 3434.92 samples/sec Loss 6.0633 LearningRate 0.0390 Epoch: 13 Global Step: 70620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:01,792-Speed 3438.63 samples/sec Loss 5.9207 LearningRate 0.0390 Epoch: 13 Global Step: 70630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:04,773-Speed 3435.83 samples/sec Loss 6.0485 LearningRate 0.0390 Epoch: 13 Global Step: 70640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:07,750-Speed 3441.08 samples/sec Loss 5.9646 LearningRate 0.0389 Epoch: 13 Global Step: 70650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:10,744-Speed 3421.17 samples/sec Loss 5.9647 LearningRate 0.0389 Epoch: 13 Global Step: 70660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:14:13,739-Speed 3419.89 samples/sec Loss 6.1478 LearningRate 0.0389 Epoch: 13 Global Step: 70670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:16,794-Speed 3353.43 samples/sec Loss 5.9153 LearningRate 0.0389 Epoch: 13 Global Step: 70680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:19,774-Speed 3436.09 samples/sec Loss 6.0657 LearningRate 0.0389 Epoch: 13 Global Step: 70690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:22,763-Speed 3427.17 samples/sec Loss 5.8629 LearningRate 0.0389 Epoch: 13 Global Step: 70700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:25,803-Speed 3369.56 samples/sec Loss 6.0832 LearningRate 0.0389 Epoch: 13 Global Step: 70710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:28,830-Speed 3383.40 samples/sec Loss 6.2293 LearningRate 0.0388 Epoch: 13 Global Step: 70720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:31,861-Speed 3379.05 samples/sec Loss 6.1718 LearningRate 0.0388 Epoch: 13 Global Step: 70730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:34,839-Speed 3439.79 samples/sec Loss 6.0428 LearningRate 0.0388 Epoch: 13 Global Step: 70740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:37,818-Speed 3439.35 samples/sec Loss 5.9617 LearningRate 0.0388 Epoch: 13 Global Step: 70750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:40,819-Speed 3412.22 samples/sec Loss 6.0376 LearningRate 0.0388 Epoch: 13 Global Step: 70760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:43,793-Speed 3444.14 samples/sec Loss 6.2344 LearningRate 0.0388 Epoch: 13 Global Step: 70770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:14:46,758-Speed 3454.58 samples/sec Loss 6.0175 LearningRate 0.0388 Epoch: 13 Global Step: 70780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:49,733-Speed 3442.93 samples/sec Loss 6.0944 LearningRate 0.0388 Epoch: 13 Global Step: 70790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:14:52,697-Speed 3456.22 samples/sec Loss 6.1007 LearningRate 0.0387 Epoch: 13 Global Step: 70800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:14:55,821-Speed 3278.90 samples/sec Loss 6.0282 LearningRate 0.0387 Epoch: 13 Global Step: 70810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:15:10,788-Speed 684.19 samples/sec Loss 5.4353 LearningRate 0.0387 Epoch: 14 Global Step: 70820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:15:13,858-Speed 3337.78 samples/sec Loss 5.2828 LearningRate 0.0387 Epoch: 14 Global Step: 70830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:15:17,005-Speed 3254.30 samples/sec Loss 5.1410 LearningRate 0.0387 Epoch: 14 Global Step: 70840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:15:20,184-Speed 3222.68 samples/sec Loss 5.3195 LearningRate 0.0387 Epoch: 14 Global Step: 70850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:15:23,173-Speed 3426.62 samples/sec Loss 5.2829 LearningRate 0.0387 Epoch: 14 Global Step: 70860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:15:26,213-Speed 3368.93 samples/sec Loss 5.3165 LearningRate 0.0386 Epoch: 14 Global Step: 70870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:15:29,269-Speed 3351.53 samples/sec Loss 5.2296 LearningRate 0.0386 Epoch: 14 Global Step: 70880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:15:32,245-Speed 3442.43 samples/sec Loss 5.3170 LearningRate 0.0386 Epoch: 14 Global Step: 70890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:15:35,254-Speed 3403.24 samples/sec Loss 5.1827 LearningRate 0.0386 Epoch: 14 Global Step: 70900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:15:38,264-Speed 3403.13 samples/sec Loss 5.2417 LearningRate 0.0386 Epoch: 14 Global Step: 70910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:15:41,273-Speed 3404.52 samples/sec Loss 5.5135 LearningRate 0.0386 Epoch: 14 Global Step: 70920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:15:44,251-Speed 3438.70 samples/sec Loss 5.3846 LearningRate 0.0386 Epoch: 14 Global Step: 70930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:15:47,305-Speed 3354.46 samples/sec Loss 5.3632 LearningRate 0.0385 Epoch: 14 Global Step: 70940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:15:50,296-Speed 3424.12 samples/sec Loss 5.3712 LearningRate 0.0385 Epoch: 14 Global Step: 70950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:15:53,270-Speed 3444.44 samples/sec Loss 5.2979 LearningRate 0.0385 Epoch: 14 Global Step: 70960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:15:56,269-Speed 3415.24 samples/sec Loss 5.3622 LearningRate 0.0385 Epoch: 14 Global Step: 70970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:15:59,248-Speed 3438.54 samples/sec Loss 5.2594 LearningRate 0.0385 Epoch: 14 Global Step: 70980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:02,273-Speed 3386.01 samples/sec Loss 5.3090 LearningRate 0.0385 Epoch: 14 Global Step: 70990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:05,236-Speed 3457.09 samples/sec Loss 5.3412 LearningRate 0.0385 Epoch: 14 Global Step: 71000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:08,257-Speed 3390.58 samples/sec Loss 5.3855 LearningRate 0.0384 Epoch: 14 Global Step: 71010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:11,290-Speed 3377.63 samples/sec Loss 5.4457 LearningRate 0.0384 Epoch: 14 Global Step: 71020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:14,313-Speed 3388.14 samples/sec Loss 5.4122 LearningRate 0.0384 Epoch: 14 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:17,300-Speed 3429.57 samples/sec Loss 5.4216 LearningRate 0.0384 Epoch: 14 Global Step: 71040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:20,345-Speed 3363.60 samples/sec Loss 5.3730 LearningRate 0.0384 Epoch: 14 Global Step: 71050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:23,365-Speed 3391.49 samples/sec Loss 5.4434 LearningRate 0.0384 Epoch: 14 Global Step: 71060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:26,360-Speed 3420.05 samples/sec Loss 5.3665 LearningRate 0.0384 Epoch: 14 Global Step: 71070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:29,372-Speed 3400.62 samples/sec Loss 5.4139 LearningRate 0.0383 Epoch: 14 Global Step: 71080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:32,373-Speed 3413.61 samples/sec Loss 5.4500 LearningRate 0.0383 Epoch: 14 Global Step: 71090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:35,351-Speed 3438.44 samples/sec Loss 5.5335 LearningRate 0.0383 Epoch: 14 Global Step: 71100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:16:38,336-Speed 3431.72 samples/sec Loss 5.3533 LearningRate 0.0383 Epoch: 14 Global Step: 71110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:16:41,321-Speed 3431.47 samples/sec Loss 5.4364 LearningRate 0.0383 Epoch: 14 Global Step: 71120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:44,328-Speed 3406.57 samples/sec Loss 5.4569 LearningRate 0.0383 Epoch: 14 Global Step: 71130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:47,435-Speed 3296.50 samples/sec Loss 5.5970 LearningRate 0.0383 Epoch: 14 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:50,454-Speed 3392.45 samples/sec Loss 5.5834 LearningRate 0.0383 Epoch: 14 Global Step: 71150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:53,454-Speed 3415.34 samples/sec Loss 5.5744 LearningRate 0.0382 Epoch: 14 Global Step: 71160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:56,434-Speed 3436.92 samples/sec Loss 5.4775 LearningRate 0.0382 Epoch: 14 Global Step: 71170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:16:59,421-Speed 3428.54 samples/sec Loss 5.5170 LearningRate 0.0382 Epoch: 14 Global Step: 71180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:17:02,517-Speed 3308.83 samples/sec Loss 5.2550 LearningRate 0.0382 Epoch: 14 Global Step: 71190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:17:05,542-Speed 3386.48 samples/sec Loss 5.4449 LearningRate 0.0382 Epoch: 14 Global Step: 71200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:17:08,645-Speed 3300.68 samples/sec Loss 5.4686 LearningRate 0.0382 Epoch: 14 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:17:11,652-Speed 3405.71 samples/sec Loss 5.5622 LearningRate 0.0382 Epoch: 14 Global Step: 71220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:17:14,665-Speed 3399.61 samples/sec Loss 5.5571 LearningRate 0.0381 Epoch: 14 Global Step: 71230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:17:17,665-Speed 3414.92 samples/sec Loss 5.6032 LearningRate 0.0381 Epoch: 14 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:17:20,644-Speed 3437.86 samples/sec Loss 5.4896 LearningRate 0.0381 Epoch: 14 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:17:23,635-Speed 3424.99 samples/sec Loss 5.5042 LearningRate 0.0381 Epoch: 14 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:17:26,619-Speed 3432.60 samples/sec Loss 5.5975 LearningRate 0.0381 Epoch: 14 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:17:29,607-Speed 3427.26 samples/sec Loss 5.5077 LearningRate 0.0381 Epoch: 14 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:17:32,663-Speed 3352.16 samples/sec Loss 5.5721 LearningRate 0.0381 Epoch: 14 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:17:35,747-Speed 3320.95 samples/sec Loss 5.6461 LearningRate 0.0380 Epoch: 14 Global Step: 71300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:17:38,792-Speed 3364.28 samples/sec Loss 5.4120 LearningRate 0.0380 Epoch: 14 Global Step: 71310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:17:41,799-Speed 3405.60 samples/sec Loss 5.4767 LearningRate 0.0380 Epoch: 14 Global Step: 71320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:17:44,790-Speed 3425.04 samples/sec Loss 5.6045 LearningRate 0.0380 Epoch: 14 Global Step: 71330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:17:47,773-Speed 3434.75 samples/sec Loss 5.5115 LearningRate 0.0380 Epoch: 14 Global Step: 71340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:17:50,790-Speed 3394.79 samples/sec Loss 5.4908 LearningRate 0.0380 Epoch: 14 Global Step: 71350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:17:53,889-Speed 3305.29 samples/sec Loss 5.6758 LearningRate 0.0380 Epoch: 14 Global Step: 71360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:17:56,920-Speed 3378.94 samples/sec Loss 5.7137 LearningRate 0.0379 Epoch: 14 Global Step: 71370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:17:59,906-Speed 3430.55 samples/sec Loss 5.6863 LearningRate 0.0379 Epoch: 14 Global Step: 71380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:02,892-Speed 3429.79 samples/sec Loss 5.4734 LearningRate 0.0379 Epoch: 14 Global Step: 71390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:05,889-Speed 3417.81 samples/sec Loss 5.5992 LearningRate 0.0379 Epoch: 14 Global Step: 71400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:18:08,876-Speed 3429.22 samples/sec Loss 5.5649 LearningRate 0.0379 Epoch: 14 Global Step: 71410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:18:11,857-Speed 3436.65 samples/sec Loss 5.6757 LearningRate 0.0379 Epoch: 14 Global Step: 71420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:18:14,904-Speed 3361.96 samples/sec Loss 5.6632 LearningRate 0.0379 Epoch: 14 Global Step: 71430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:18:17,872-Speed 3450.27 samples/sec Loss 5.6467 LearningRate 0.0379 Epoch: 14 Global Step: 71440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:20,902-Speed 3380.37 samples/sec Loss 5.5789 LearningRate 0.0378 Epoch: 14 Global Step: 71450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:23,895-Speed 3422.62 samples/sec Loss 5.6188 LearningRate 0.0378 Epoch: 14 Global Step: 71460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:26,894-Speed 3415.45 samples/sec Loss 5.5905 LearningRate 0.0378 Epoch: 14 Global Step: 71470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:30,067-Speed 3227.87 samples/sec Loss 5.6592 LearningRate 0.0378 Epoch: 14 Global Step: 71480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:33,118-Speed 3357.65 samples/sec Loss 5.5502 LearningRate 0.0378 Epoch: 14 Global Step: 71490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:36,168-Speed 3357.80 samples/sec Loss 5.5353 LearningRate 0.0378 Epoch: 14 Global Step: 71500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:39,160-Speed 3423.69 samples/sec Loss 5.6388 LearningRate 0.0378 Epoch: 14 Global Step: 71510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:42,198-Speed 3371.34 samples/sec Loss 5.5772 LearningRate 0.0377 Epoch: 14 Global Step: 71520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:45,195-Speed 3417.74 samples/sec Loss 5.4891 LearningRate 0.0377 Epoch: 14 Global Step: 71530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:18:48,178-Speed 3433.01 samples/sec Loss 5.5261 LearningRate 0.0377 Epoch: 14 Global Step: 71540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:18:51,160-Speed 3434.75 samples/sec Loss 5.6671 LearningRate 0.0377 Epoch: 14 Global Step: 71550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:18:54,140-Speed 3437.45 samples/sec Loss 5.6852 LearningRate 0.0377 Epoch: 14 Global Step: 71560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:18:57,129-Speed 3426.75 samples/sec Loss 5.6647 LearningRate 0.0377 Epoch: 14 Global Step: 71570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:00,115-Speed 3429.79 samples/sec Loss 5.6043 LearningRate 0.0377 Epoch: 14 Global Step: 71580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:03,100-Speed 3431.81 samples/sec Loss 5.8283 LearningRate 0.0376 Epoch: 14 Global Step: 71590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:06,104-Speed 3409.57 samples/sec Loss 5.6687 LearningRate 0.0376 Epoch: 14 Global Step: 71600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:09,087-Speed 3434.85 samples/sec Loss 5.7506 LearningRate 0.0376 Epoch: 14 Global Step: 71610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:12,117-Speed 3380.52 samples/sec Loss 5.6805 LearningRate 0.0376 Epoch: 14 Global Step: 71620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:15,102-Speed 3430.68 samples/sec Loss 5.6804 LearningRate 0.0376 Epoch: 14 Global Step: 71630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:18,086-Speed 3433.16 samples/sec Loss 5.6113 LearningRate 0.0376 Epoch: 14 Global Step: 71640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:19:21,226-Speed 3261.83 samples/sec Loss 5.5608 LearningRate 0.0376 Epoch: 14 Global Step: 71650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:19:24,278-Speed 3356.47 samples/sec Loss 5.7506 LearningRate 0.0375 Epoch: 14 Global Step: 71660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:19:27,275-Speed 3418.33 samples/sec Loss 5.6337 LearningRate 0.0375 Epoch: 14 Global Step: 71670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:19:30,272-Speed 3417.08 samples/sec Loss 5.7735 LearningRate 0.0375 Epoch: 14 Global Step: 71680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:19:33,283-Speed 3402.62 samples/sec Loss 5.7614 LearningRate 0.0375 Epoch: 14 Global Step: 71690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:36,313-Speed 3380.22 samples/sec Loss 5.6828 LearningRate 0.0375 Epoch: 14 Global Step: 71700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:39,294-Speed 3436.05 samples/sec Loss 5.5231 LearningRate 0.0375 Epoch: 14 Global Step: 71710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:42,291-Speed 3418.44 samples/sec Loss 5.7378 LearningRate 0.0375 Epoch: 14 Global Step: 71720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:45,287-Speed 3418.04 samples/sec Loss 5.8452 LearningRate 0.0375 Epoch: 14 Global Step: 71730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:48,298-Speed 3401.79 samples/sec Loss 5.7272 LearningRate 0.0374 Epoch: 14 Global Step: 71740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:51,305-Speed 3405.91 samples/sec Loss 5.6156 LearningRate 0.0374 Epoch: 14 Global Step: 71750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:19:54,426-Speed 3281.74 samples/sec Loss 5.7978 LearningRate 0.0374 Epoch: 14 Global Step: 71760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:19:57,423-Speed 3418.27 samples/sec Loss 5.6402 LearningRate 0.0374 Epoch: 14 Global Step: 71770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:20:00,416-Speed 3422.34 samples/sec Loss 5.5969 LearningRate 0.0374 Epoch: 14 Global Step: 71780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:20:03,541-Speed 3278.11 samples/sec Loss 5.7829 LearningRate 0.0374 Epoch: 14 Global Step: 71790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:20:06,681-Speed 3262.00 samples/sec Loss 5.8150 LearningRate 0.0374 Epoch: 14 Global Step: 71800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:20:09,796-Speed 3287.66 samples/sec Loss 5.6542 LearningRate 0.0373 Epoch: 14 Global Step: 71810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:20:12,860-Speed 3342.54 samples/sec Loss 5.7126 LearningRate 0.0373 Epoch: 14 Global Step: 71820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:20:15,983-Speed 3279.75 samples/sec Loss 5.6344 LearningRate 0.0373 Epoch: 14 Global Step: 71830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:20:19,012-Speed 3382.44 samples/sec Loss 5.8017 LearningRate 0.0373 Epoch: 14 Global Step: 71840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:20:22,013-Speed 3412.40 samples/sec Loss 5.8193 LearningRate 0.0373 Epoch: 14 Global Step: 71850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:20:24,999-Speed 3430.09 samples/sec Loss 5.9044 LearningRate 0.0373 Epoch: 14 Global Step: 71860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:20:27,999-Speed 3415.10 samples/sec Loss 5.7338 LearningRate 0.0373 Epoch: 14 Global Step: 71870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:20:31,012-Speed 3399.75 samples/sec Loss 5.8904 LearningRate 0.0372 Epoch: 14 Global Step: 71880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:20:34,019-Speed 3406.63 samples/sec Loss 5.6643 LearningRate 0.0372 Epoch: 14 Global Step: 71890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:20:37,000-Speed 3435.81 samples/sec Loss 5.7367 LearningRate 0.0372 Epoch: 14 Global Step: 71900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:20:39,990-Speed 3424.51 samples/sec Loss 5.9585 LearningRate 0.0372 Epoch: 14 Global Step: 71910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:20:43,066-Speed 3330.85 samples/sec Loss 5.6743 LearningRate 0.0372 Epoch: 14 Global Step: 71920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:20:46,069-Speed 3410.65 samples/sec Loss 5.6952 LearningRate 0.0372 Epoch: 14 Global Step: 71930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:20:49,056-Speed 3429.64 samples/sec Loss 5.7540 LearningRate 0.0372 Epoch: 14 Global Step: 71940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:20:52,104-Speed 3360.34 samples/sec Loss 5.6059 LearningRate 0.0372 Epoch: 14 Global Step: 71950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:20:55,111-Speed 3406.06 samples/sec Loss 5.9091 LearningRate 0.0371 Epoch: 14 Global Step: 71960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:20:58,121-Speed 3403.74 samples/sec Loss 5.7967 LearningRate 0.0371 Epoch: 14 Global Step: 71970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:21:01,106-Speed 3431.71 samples/sec Loss 5.5415 LearningRate 0.0371 Epoch: 14 Global Step: 71980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:21:04,094-Speed 3429.08 samples/sec Loss 5.7590 LearningRate 0.0371 Epoch: 14 Global Step: 71990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:21:07,105-Speed 3401.09 samples/sec Loss 5.7458 LearningRate 0.0371 Epoch: 14 Global Step: 72000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:21:50,299-[lfw][72000]XNorm: 23.501909 Training: 2022-01-20 00:21:50,300-[lfw][72000]Accuracy-Flip: 0.99700+-0.00340 Training: 2022-01-20 00:21:50,301-[lfw][72000]Accuracy-Highest: 0.99767 Training: 2022-01-20 00:22:40,403-[cfp_fp][72000]XNorm: 21.007811 Training: 2022-01-20 00:22:40,403-[cfp_fp][72000]Accuracy-Flip: 0.97443+-0.01106 Training: 2022-01-20 00:22:40,404-[cfp_fp][72000]Accuracy-Highest: 0.97543 Training: 2022-01-20 00:23:23,537-[agedb_30][72000]XNorm: 23.336441 Training: 2022-01-20 00:23:23,538-[agedb_30][72000]Accuracy-Flip: 0.97950+-0.00799 Training: 2022-01-20 00:23:23,539-[agedb_30][72000]Accuracy-Highest: 0.97950 Training: 2022-01-20 00:23:26,535-Speed 73.44 samples/sec Loss 5.8514 LearningRate 0.0371 Epoch: 14 Global Step: 72010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:23:29,525-Speed 3425.85 samples/sec Loss 5.7797 LearningRate 0.0371 Epoch: 14 Global Step: 72020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:23:32,500-Speed 3442.51 samples/sec Loss 5.7816 LearningRate 0.0370 Epoch: 14 Global Step: 72030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:23:35,479-Speed 3438.87 samples/sec Loss 5.7059 LearningRate 0.0370 Epoch: 14 Global Step: 72040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:23:38,506-Speed 3383.48 samples/sec Loss 5.7847 LearningRate 0.0370 Epoch: 14 Global Step: 72050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:23:41,653-Speed 3254.41 samples/sec Loss 5.6814 LearningRate 0.0370 Epoch: 14 Global Step: 72060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:23:44,753-Speed 3304.86 samples/sec Loss 5.5860 LearningRate 0.0370 Epoch: 14 Global Step: 72070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:23:47,734-Speed 3435.73 samples/sec Loss 5.7011 LearningRate 0.0370 Epoch: 14 Global Step: 72080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:23:50,837-Speed 3300.58 samples/sec Loss 5.6680 LearningRate 0.0370 Epoch: 14 Global Step: 72090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:23:53,885-Speed 3360.24 samples/sec Loss 5.6991 LearningRate 0.0369 Epoch: 14 Global Step: 72100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:23:56,900-Speed 3397.64 samples/sec Loss 5.7341 LearningRate 0.0369 Epoch: 14 Global Step: 72110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:23:59,881-Speed 3437.29 samples/sec Loss 5.7537 LearningRate 0.0369 Epoch: 14 Global Step: 72120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:02,902-Speed 3390.14 samples/sec Loss 5.7881 LearningRate 0.0369 Epoch: 14 Global Step: 72130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:05,921-Speed 3392.34 samples/sec Loss 5.6184 LearningRate 0.0369 Epoch: 14 Global Step: 72140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:08,907-Speed 3430.71 samples/sec Loss 5.6500 LearningRate 0.0369 Epoch: 14 Global Step: 72150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:11,893-Speed 3429.64 samples/sec Loss 5.7033 LearningRate 0.0369 Epoch: 14 Global Step: 72160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:14,876-Speed 3433.84 samples/sec Loss 5.7353 LearningRate 0.0369 Epoch: 14 Global Step: 72170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:17,857-Speed 3435.61 samples/sec Loss 5.8270 LearningRate 0.0368 Epoch: 14 Global Step: 72180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:20,867-Speed 3403.60 samples/sec Loss 5.7551 LearningRate 0.0368 Epoch: 14 Global Step: 72190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:23,853-Speed 3431.12 samples/sec Loss 5.7119 LearningRate 0.0368 Epoch: 14 Global Step: 72200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:26,843-Speed 3425.92 samples/sec Loss 5.6491 LearningRate 0.0368 Epoch: 14 Global Step: 72210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-20 00:24:29,836-Speed 3422.18 samples/sec Loss 5.7173 LearningRate 0.0368 Epoch: 14 Global Step: 72220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:32,889-Speed 3354.75 samples/sec Loss 5.7412 LearningRate 0.0368 Epoch: 14 Global Step: 72230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:35,874-Speed 3430.86 samples/sec Loss 5.7574 LearningRate 0.0368 Epoch: 14 Global Step: 72240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:38,863-Speed 3426.68 samples/sec Loss 5.7164 LearningRate 0.0367 Epoch: 14 Global Step: 72250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:41,847-Speed 3433.10 samples/sec Loss 5.8021 LearningRate 0.0367 Epoch: 14 Global Step: 72260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:44,840-Speed 3422.44 samples/sec Loss 5.7172 LearningRate 0.0367 Epoch: 14 Global Step: 72270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:47,827-Speed 3428.08 samples/sec Loss 5.7876 LearningRate 0.0367 Epoch: 14 Global Step: 72280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:50,824-Speed 3418.38 samples/sec Loss 5.8789 LearningRate 0.0367 Epoch: 14 Global Step: 72290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:53,819-Speed 3420.31 samples/sec Loss 5.7351 LearningRate 0.0367 Epoch: 14 Global Step: 72300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:56,814-Speed 3419.24 samples/sec Loss 5.6105 LearningRate 0.0367 Epoch: 14 Global Step: 72310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:24:59,773-Speed 3462.09 samples/sec Loss 5.6518 LearningRate 0.0366 Epoch: 14 Global Step: 72320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:25:02,811-Speed 3371.32 samples/sec Loss 5.7435 LearningRate 0.0366 Epoch: 14 Global Step: 72330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:25:05,796-Speed 3430.99 samples/sec Loss 5.6582 LearningRate 0.0366 Epoch: 14 Global Step: 72340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:25:08,784-Speed 3428.77 samples/sec Loss 5.7309 LearningRate 0.0366 Epoch: 14 Global Step: 72350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:25:11,763-Speed 3437.82 samples/sec Loss 5.7075 LearningRate 0.0366 Epoch: 14 Global Step: 72360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:25:14,742-Speed 3438.07 samples/sec Loss 5.6224 LearningRate 0.0366 Epoch: 14 Global Step: 72370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:25:17,732-Speed 3425.94 samples/sec Loss 5.7762 LearningRate 0.0366 Epoch: 14 Global Step: 72380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:25:20,738-Speed 3407.21 samples/sec Loss 5.7913 LearningRate 0.0366 Epoch: 14 Global Step: 72390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:25:23,774-Speed 3374.43 samples/sec Loss 5.6314 LearningRate 0.0365 Epoch: 14 Global Step: 72400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:25:26,831-Speed 3349.82 samples/sec Loss 5.7215 LearningRate 0.0365 Epoch: 14 Global Step: 72410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-20 00:25:29,863-Speed 3378.85 samples/sec Loss 5.8423 LearningRate 0.0365 Epoch: 14 Global Step: 72420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:25:33,006-Speed 3258.04 samples/sec Loss 5.7757 LearningRate 0.0365 Epoch: 14 Global Step: 72430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:25:36,013-Speed 3407.57 samples/sec Loss 5.8331 LearningRate 0.0365 Epoch: 14 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:25:39,025-Speed 3400.22 samples/sec Loss 5.8398 LearningRate 0.0365 Epoch: 14 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:25:42,014-Speed 3426.59 samples/sec Loss 5.7268 LearningRate 0.0365 Epoch: 14 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:25:44,993-Speed 3438.39 samples/sec Loss 5.7301 LearningRate 0.0364 Epoch: 14 Global Step: 72470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:25:47,973-Speed 3437.27 samples/sec Loss 5.7476 LearningRate 0.0364 Epoch: 14 Global Step: 72480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-20 00:25:50,951-Speed 3440.55 samples/sec Loss 5.6477 LearningRate 0.0364 Epoch: 14 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:25:53,938-Speed 3429.01 samples/sec Loss 5.6903 LearningRate 0.0364 Epoch: 14 Global Step: 72500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:25:56,930-Speed 3423.88 samples/sec Loss 5.7810 LearningRate 0.0364 Epoch: 14 Global Step: 72510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:25:59,930-Speed 3413.64 samples/sec Loss 5.7124 LearningRate 0.0364 Epoch: 14 Global Step: 72520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:02,956-Speed 3385.33 samples/sec Loss 5.7600 LearningRate 0.0364 Epoch: 14 Global Step: 72530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:05,937-Speed 3436.43 samples/sec Loss 5.7353 LearningRate 0.0364 Epoch: 14 Global Step: 72540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:08,980-Speed 3365.47 samples/sec Loss 5.8386 LearningRate 0.0363 Epoch: 14 Global Step: 72550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:11,969-Speed 3427.11 samples/sec Loss 5.7373 LearningRate 0.0363 Epoch: 14 Global Step: 72560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:15,033-Speed 3343.55 samples/sec Loss 5.9696 LearningRate 0.0363 Epoch: 14 Global Step: 72570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:18,111-Speed 3326.68 samples/sec Loss 5.6043 LearningRate 0.0363 Epoch: 14 Global Step: 72580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:21,114-Speed 3411.57 samples/sec Loss 5.7261 LearningRate 0.0363 Epoch: 14 Global Step: 72590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:24,138-Speed 3387.36 samples/sec Loss 5.8756 LearningRate 0.0363 Epoch: 14 Global Step: 72600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:27,176-Speed 3370.44 samples/sec Loss 5.6410 LearningRate 0.0363 Epoch: 14 Global Step: 72610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:30,157-Speed 3436.47 samples/sec Loss 5.8630 LearningRate 0.0362 Epoch: 14 Global Step: 72620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:26:33,141-Speed 3432.26 samples/sec Loss 5.6738 LearningRate 0.0362 Epoch: 14 Global Step: 72630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:36,115-Speed 3444.35 samples/sec Loss 5.7306 LearningRate 0.0362 Epoch: 14 Global Step: 72640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:39,095-Speed 3437.54 samples/sec Loss 5.8249 LearningRate 0.0362 Epoch: 14 Global Step: 72650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:42,091-Speed 3418.53 samples/sec Loss 5.7728 LearningRate 0.0362 Epoch: 14 Global Step: 72660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:45,161-Speed 3336.78 samples/sec Loss 5.6491 LearningRate 0.0362 Epoch: 14 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:48,197-Speed 3373.80 samples/sec Loss 5.7625 LearningRate 0.0362 Epoch: 14 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:51,282-Speed 3319.97 samples/sec Loss 5.6758 LearningRate 0.0362 Epoch: 14 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:54,355-Speed 3332.95 samples/sec Loss 5.6987 LearningRate 0.0361 Epoch: 14 Global Step: 72700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:26:57,339-Speed 3433.28 samples/sec Loss 5.7797 LearningRate 0.0361 Epoch: 14 Global Step: 72710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:27:00,439-Speed 3304.13 samples/sec Loss 5.7581 LearningRate 0.0361 Epoch: 14 Global Step: 72720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:27:03,591-Speed 3248.68 samples/sec Loss 5.7156 LearningRate 0.0361 Epoch: 14 Global Step: 72730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:27:06,730-Speed 3263.75 samples/sec Loss 5.7745 LearningRate 0.0361 Epoch: 14 Global Step: 72740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:27:09,787-Speed 3350.57 samples/sec Loss 5.7599 LearningRate 0.0361 Epoch: 14 Global Step: 72750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:27:12,789-Speed 3411.93 samples/sec Loss 5.7883 LearningRate 0.0361 Epoch: 14 Global Step: 72760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:27:15,818-Speed 3381.21 samples/sec Loss 5.7729 LearningRate 0.0360 Epoch: 14 Global Step: 72770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:27:18,825-Speed 3406.79 samples/sec Loss 5.6840 LearningRate 0.0360 Epoch: 14 Global Step: 72780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:27:21,811-Speed 3429.91 samples/sec Loss 5.6989 LearningRate 0.0360 Epoch: 14 Global Step: 72790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:27:24,799-Speed 3427.66 samples/sec Loss 5.7974 LearningRate 0.0360 Epoch: 14 Global Step: 72800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:27:27,799-Speed 3414.64 samples/sec Loss 5.7327 LearningRate 0.0360 Epoch: 14 Global Step: 72810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:27:30,787-Speed 3430.18 samples/sec Loss 5.7001 LearningRate 0.0360 Epoch: 14 Global Step: 72820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:27:33,780-Speed 3421.18 samples/sec Loss 5.7738 LearningRate 0.0360 Epoch: 14 Global Step: 72830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:27:36,801-Speed 3390.83 samples/sec Loss 5.7033 LearningRate 0.0359 Epoch: 14 Global Step: 72840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:27:39,796-Speed 3420.30 samples/sec Loss 5.7700 LearningRate 0.0359 Epoch: 14 Global Step: 72850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:27:42,797-Speed 3412.87 samples/sec Loss 5.6793 LearningRate 0.0359 Epoch: 14 Global Step: 72860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:27:45,832-Speed 3375.53 samples/sec Loss 5.7726 LearningRate 0.0359 Epoch: 14 Global Step: 72870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:27:48,842-Speed 3402.28 samples/sec Loss 5.8677 LearningRate 0.0359 Epoch: 14 Global Step: 72880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:27:51,982-Speed 3262.87 samples/sec Loss 5.8683 LearningRate 0.0359 Epoch: 14 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:27:54,970-Speed 3428.27 samples/sec Loss 5.7026 LearningRate 0.0359 Epoch: 14 Global Step: 72900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:27:57,983-Speed 3398.99 samples/sec Loss 5.8107 LearningRate 0.0359 Epoch: 14 Global Step: 72910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:28:00,971-Speed 3428.46 samples/sec Loss 5.7659 LearningRate 0.0358 Epoch: 14 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:28:03,979-Speed 3404.75 samples/sec Loss 5.7598 LearningRate 0.0358 Epoch: 14 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:28:06,969-Speed 3426.46 samples/sec Loss 5.7292 LearningRate 0.0358 Epoch: 14 Global Step: 72940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:28:09,956-Speed 3428.74 samples/sec Loss 5.8480 LearningRate 0.0358 Epoch: 14 Global Step: 72950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:28:12,949-Speed 3422.63 samples/sec Loss 5.7509 LearningRate 0.0358 Epoch: 14 Global Step: 72960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:28:15,990-Speed 3368.80 samples/sec Loss 5.7862 LearningRate 0.0358 Epoch: 14 Global Step: 72970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:28:18,981-Speed 3423.82 samples/sec Loss 5.7509 LearningRate 0.0358 Epoch: 14 Global Step: 72980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:28:21,977-Speed 3419.15 samples/sec Loss 5.8182 LearningRate 0.0357 Epoch: 14 Global Step: 72990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:28:24,997-Speed 3391.36 samples/sec Loss 5.8299 LearningRate 0.0357 Epoch: 14 Global Step: 73000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:28:27,987-Speed 3425.17 samples/sec Loss 5.7682 LearningRate 0.0357 Epoch: 14 Global Step: 73010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:28:30,966-Speed 3439.99 samples/sec Loss 5.7470 LearningRate 0.0357 Epoch: 14 Global Step: 73020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:28:33,942-Speed 3441.00 samples/sec Loss 5.5484 LearningRate 0.0357 Epoch: 14 Global Step: 73030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:28:36,983-Speed 3368.89 samples/sec Loss 5.7366 LearningRate 0.0357 Epoch: 14 Global Step: 73040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:28:39,968-Speed 3431.36 samples/sec Loss 5.7002 LearningRate 0.0357 Epoch: 14 Global Step: 73050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:28:42,968-Speed 3413.42 samples/sec Loss 5.7283 LearningRate 0.0357 Epoch: 14 Global Step: 73060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:28:45,961-Speed 3422.69 samples/sec Loss 5.7666 LearningRate 0.0356 Epoch: 14 Global Step: 73070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:28:48,938-Speed 3441.18 samples/sec Loss 5.7952 LearningRate 0.0356 Epoch: 14 Global Step: 73080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:28:51,920-Speed 3435.05 samples/sec Loss 5.8686 LearningRate 0.0356 Epoch: 14 Global Step: 73090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:28:54,909-Speed 3426.07 samples/sec Loss 5.8403 LearningRate 0.0356 Epoch: 14 Global Step: 73100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:28:57,918-Speed 3404.04 samples/sec Loss 5.8636 LearningRate 0.0356 Epoch: 14 Global Step: 73110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:29:00,906-Speed 3428.44 samples/sec Loss 5.8119 LearningRate 0.0356 Epoch: 14 Global Step: 73120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:29:03,889-Speed 3434.06 samples/sec Loss 5.8224 LearningRate 0.0356 Epoch: 14 Global Step: 73130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:29:06,896-Speed 3405.71 samples/sec Loss 5.7432 LearningRate 0.0355 Epoch: 14 Global Step: 73140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:29:09,881-Speed 3432.11 samples/sec Loss 5.6941 LearningRate 0.0355 Epoch: 14 Global Step: 73150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:29:12,888-Speed 3406.18 samples/sec Loss 5.6446 LearningRate 0.0355 Epoch: 14 Global Step: 73160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:29:15,878-Speed 3425.92 samples/sec Loss 5.7074 LearningRate 0.0355 Epoch: 14 Global Step: 73170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:29:18,858-Speed 3436.21 samples/sec Loss 5.7867 LearningRate 0.0355 Epoch: 14 Global Step: 73180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:29:21,838-Speed 3437.47 samples/sec Loss 5.7570 LearningRate 0.0355 Epoch: 14 Global Step: 73190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:29:24,849-Speed 3401.81 samples/sec Loss 5.6916 LearningRate 0.0355 Epoch: 14 Global Step: 73200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:29:28,050-Speed 3199.41 samples/sec Loss 5.7782 LearningRate 0.0355 Epoch: 14 Global Step: 73210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:29:31,131-Speed 3325.00 samples/sec Loss 5.7435 LearningRate 0.0354 Epoch: 14 Global Step: 73220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:29:34,161-Speed 3380.98 samples/sec Loss 5.9010 LearningRate 0.0354 Epoch: 14 Global Step: 73230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:29:37,206-Speed 3363.38 samples/sec Loss 5.7489 LearningRate 0.0354 Epoch: 14 Global Step: 73240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:29:40,342-Speed 3266.59 samples/sec Loss 5.7078 LearningRate 0.0354 Epoch: 14 Global Step: 73250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:29:43,373-Speed 3378.79 samples/sec Loss 5.6502 LearningRate 0.0354 Epoch: 14 Global Step: 73260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:29:46,402-Speed 3381.05 samples/sec Loss 5.6850 LearningRate 0.0354 Epoch: 14 Global Step: 73270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:29:49,455-Speed 3354.67 samples/sec Loss 5.6918 LearningRate 0.0354 Epoch: 14 Global Step: 73280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:29:52,452-Speed 3417.85 samples/sec Loss 5.6952 LearningRate 0.0353 Epoch: 14 Global Step: 73290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:29:55,442-Speed 3427.19 samples/sec Loss 5.8369 LearningRate 0.0353 Epoch: 14 Global Step: 73300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:29:58,438-Speed 3418.81 samples/sec Loss 5.6771 LearningRate 0.0353 Epoch: 14 Global Step: 73310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:30:01,419-Speed 3436.02 samples/sec Loss 5.8096 LearningRate 0.0353 Epoch: 14 Global Step: 73320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:30:04,446-Speed 3383.75 samples/sec Loss 5.7133 LearningRate 0.0353 Epoch: 14 Global Step: 73330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:30:07,412-Speed 3453.27 samples/sec Loss 5.6379 LearningRate 0.0353 Epoch: 14 Global Step: 73340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:30:10,402-Speed 3425.94 samples/sec Loss 5.6486 LearningRate 0.0353 Epoch: 14 Global Step: 73350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:30:13,386-Speed 3432.63 samples/sec Loss 5.6309 LearningRate 0.0353 Epoch: 14 Global Step: 73360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:30:16,367-Speed 3436.72 samples/sec Loss 5.7296 LearningRate 0.0352 Epoch: 14 Global Step: 73370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:30:19,355-Speed 3427.57 samples/sec Loss 5.8029 LearningRate 0.0352 Epoch: 14 Global Step: 73380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:30:22,343-Speed 3428.12 samples/sec Loss 5.6494 LearningRate 0.0352 Epoch: 14 Global Step: 73390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:30:25,332-Speed 3427.35 samples/sec Loss 5.7047 LearningRate 0.0352 Epoch: 14 Global Step: 73400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:30:28,354-Speed 3389.36 samples/sec Loss 5.6439 LearningRate 0.0352 Epoch: 14 Global Step: 73410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:30:31,335-Speed 3437.26 samples/sec Loss 5.8939 LearningRate 0.0352 Epoch: 14 Global Step: 73420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:30:34,319-Speed 3431.26 samples/sec Loss 5.8732 LearningRate 0.0352 Epoch: 14 Global Step: 73430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:30:37,321-Speed 3412.71 samples/sec Loss 5.6850 LearningRate 0.0351 Epoch: 14 Global Step: 73440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:30:40,364-Speed 3365.85 samples/sec Loss 5.7524 LearningRate 0.0351 Epoch: 14 Global Step: 73450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:30:43,370-Speed 3406.56 samples/sec Loss 5.7234 LearningRate 0.0351 Epoch: 14 Global Step: 73460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:30:46,438-Speed 3339.19 samples/sec Loss 5.8162 LearningRate 0.0351 Epoch: 14 Global Step: 73470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:30:49,473-Speed 3374.30 samples/sec Loss 5.8225 LearningRate 0.0351 Epoch: 14 Global Step: 73480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:30:52,556-Speed 3323.11 samples/sec Loss 5.6012 LearningRate 0.0351 Epoch: 14 Global Step: 73490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:30:55,564-Speed 3404.69 samples/sec Loss 5.6120 LearningRate 0.0351 Epoch: 14 Global Step: 73500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:30:58,558-Speed 3421.31 samples/sec Loss 5.8233 LearningRate 0.0351 Epoch: 14 Global Step: 73510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:01,619-Speed 3346.18 samples/sec Loss 5.7215 LearningRate 0.0350 Epoch: 14 Global Step: 73520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:04,621-Speed 3412.12 samples/sec Loss 5.8112 LearningRate 0.0350 Epoch: 14 Global Step: 73530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:07,672-Speed 3357.07 samples/sec Loss 5.6977 LearningRate 0.0350 Epoch: 14 Global Step: 73540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:31:10,657-Speed 3431.75 samples/sec Loss 5.7669 LearningRate 0.0350 Epoch: 14 Global Step: 73550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:31:13,650-Speed 3422.12 samples/sec Loss 5.7509 LearningRate 0.0350 Epoch: 14 Global Step: 73560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:31:16,644-Speed 3420.72 samples/sec Loss 5.7209 LearningRate 0.0350 Epoch: 14 Global Step: 73570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:31:19,645-Speed 3413.27 samples/sec Loss 5.7722 LearningRate 0.0350 Epoch: 14 Global Step: 73580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:31:22,680-Speed 3375.28 samples/sec Loss 5.8614 LearningRate 0.0349 Epoch: 14 Global Step: 73590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:25,673-Speed 3422.35 samples/sec Loss 5.6827 LearningRate 0.0349 Epoch: 14 Global Step: 73600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:28,707-Speed 3376.09 samples/sec Loss 5.8389 LearningRate 0.0349 Epoch: 14 Global Step: 73610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:31,758-Speed 3356.97 samples/sec Loss 5.5817 LearningRate 0.0349 Epoch: 14 Global Step: 73620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:34,818-Speed 3347.13 samples/sec Loss 5.9632 LearningRate 0.0349 Epoch: 14 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:37,815-Speed 3417.16 samples/sec Loss 5.6941 LearningRate 0.0349 Epoch: 14 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:40,803-Speed 3428.51 samples/sec Loss 5.7511 LearningRate 0.0349 Epoch: 14 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:43,954-Speed 3250.75 samples/sec Loss 5.7501 LearningRate 0.0349 Epoch: 14 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:47,046-Speed 3313.30 samples/sec Loss 5.5649 LearningRate 0.0348 Epoch: 14 Global Step: 73670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:50,029-Speed 3432.59 samples/sec Loss 5.7590 LearningRate 0.0348 Epoch: 14 Global Step: 73680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:31:53,003-Speed 3443.94 samples/sec Loss 5.6412 LearningRate 0.0348 Epoch: 14 Global Step: 73690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:31:55,992-Speed 3428.07 samples/sec Loss 5.6706 LearningRate 0.0348 Epoch: 14 Global Step: 73700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:31:59,004-Speed 3400.36 samples/sec Loss 5.7545 LearningRate 0.0348 Epoch: 14 Global Step: 73710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:01,989-Speed 3431.26 samples/sec Loss 5.6868 LearningRate 0.0348 Epoch: 14 Global Step: 73720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:04,985-Speed 3418.73 samples/sec Loss 5.8369 LearningRate 0.0348 Epoch: 14 Global Step: 73730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:07,974-Speed 3426.99 samples/sec Loss 5.8568 LearningRate 0.0348 Epoch: 14 Global Step: 73740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:10,998-Speed 3387.01 samples/sec Loss 5.7593 LearningRate 0.0347 Epoch: 14 Global Step: 73750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:14,070-Speed 3333.97 samples/sec Loss 5.7233 LearningRate 0.0347 Epoch: 14 Global Step: 73760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:17,212-Speed 3260.52 samples/sec Loss 5.8578 LearningRate 0.0347 Epoch: 14 Global Step: 73770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:20,198-Speed 3430.50 samples/sec Loss 5.9015 LearningRate 0.0347 Epoch: 14 Global Step: 73780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:23,179-Speed 3435.04 samples/sec Loss 5.6651 LearningRate 0.0347 Epoch: 14 Global Step: 73790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:32:26,169-Speed 3426.54 samples/sec Loss 5.6778 LearningRate 0.0347 Epoch: 14 Global Step: 73800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:32:29,161-Speed 3423.32 samples/sec Loss 5.5796 LearningRate 0.0347 Epoch: 14 Global Step: 73810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:32:32,285-Speed 3278.67 samples/sec Loss 5.6484 LearningRate 0.0346 Epoch: 14 Global Step: 73820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:32:35,269-Speed 3432.14 samples/sec Loss 5.5479 LearningRate 0.0346 Epoch: 14 Global Step: 73830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:32:38,313-Speed 3364.47 samples/sec Loss 5.8549 LearningRate 0.0346 Epoch: 14 Global Step: 73840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:32:41,305-Speed 3423.76 samples/sec Loss 5.7910 LearningRate 0.0346 Epoch: 14 Global Step: 73850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:44,341-Speed 3374.10 samples/sec Loss 5.6985 LearningRate 0.0346 Epoch: 14 Global Step: 73860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:47,366-Speed 3385.99 samples/sec Loss 5.7627 LearningRate 0.0346 Epoch: 14 Global Step: 73870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:50,386-Speed 3392.08 samples/sec Loss 5.7898 LearningRate 0.0346 Epoch: 14 Global Step: 73880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:53,408-Speed 3388.17 samples/sec Loss 5.6079 LearningRate 0.0346 Epoch: 14 Global Step: 73890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:56,432-Speed 3388.36 samples/sec Loss 5.7243 LearningRate 0.0345 Epoch: 14 Global Step: 73900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:32:59,441-Speed 3402.85 samples/sec Loss 5.6037 LearningRate 0.0345 Epoch: 14 Global Step: 73910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:33:02,456-Speed 3398.38 samples/sec Loss 5.7404 LearningRate 0.0345 Epoch: 14 Global Step: 73920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:33:05,595-Speed 3262.38 samples/sec Loss 5.7462 LearningRate 0.0345 Epoch: 14 Global Step: 73930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:33:08,613-Speed 3394.55 samples/sec Loss 5.7157 LearningRate 0.0345 Epoch: 14 Global Step: 73940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:33:11,616-Speed 3410.77 samples/sec Loss 5.7159 LearningRate 0.0345 Epoch: 14 Global Step: 73950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:33:14,609-Speed 3422.56 samples/sec Loss 5.5836 LearningRate 0.0345 Epoch: 14 Global Step: 73960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:33:17,596-Speed 3428.08 samples/sec Loss 5.7463 LearningRate 0.0344 Epoch: 14 Global Step: 73970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:33:20,582-Speed 3431.18 samples/sec Loss 5.6619 LearningRate 0.0344 Epoch: 14 Global Step: 73980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:33:23,598-Speed 3395.16 samples/sec Loss 5.8048 LearningRate 0.0344 Epoch: 14 Global Step: 73990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:33:26,584-Speed 3430.47 samples/sec Loss 5.6765 LearningRate 0.0344 Epoch: 14 Global Step: 74000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:34:09,581-[lfw][74000]XNorm: 21.994711 Training: 2022-01-20 00:34:09,581-[lfw][74000]Accuracy-Flip: 0.99717+-0.00308 Training: 2022-01-20 00:34:09,582-[lfw][74000]Accuracy-Highest: 0.99767 Training: 2022-01-20 00:34:59,549-[cfp_fp][74000]XNorm: 19.977872 Training: 2022-01-20 00:34:59,550-[cfp_fp][74000]Accuracy-Flip: 0.97386+-0.00673 Training: 2022-01-20 00:34:59,550-[cfp_fp][74000]Accuracy-Highest: 0.97543 Training: 2022-01-20 00:35:42,760-[agedb_30][74000]XNorm: 22.202002 Training: 2022-01-20 00:35:42,794-[agedb_30][74000]Accuracy-Flip: 0.97717+-0.00885 Training: 2022-01-20 00:35:42,795-[agedb_30][74000]Accuracy-Highest: 0.97950 Training: 2022-01-20 00:35:45,773-Speed 73.57 samples/sec Loss 5.7193 LearningRate 0.0344 Epoch: 14 Global Step: 74010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:35:48,797-Speed 3386.85 samples/sec Loss 5.8075 LearningRate 0.0344 Epoch: 14 Global Step: 74020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:35:51,772-Speed 3442.84 samples/sec Loss 5.7148 LearningRate 0.0344 Epoch: 14 Global Step: 74030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:35:55,425-Speed 3418.91 samples/sec Loss 5.7389 LearningRate 0.0344 Epoch: 14 Global Step: 74040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:35:58,587-Speed 3239.69 samples/sec Loss 5.6957 LearningRate 0.0343 Epoch: 14 Global Step: 74050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:36:01,627-Speed 3368.30 samples/sec Loss 5.7594 LearningRate 0.0343 Epoch: 14 Global Step: 74060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:36:04,609-Speed 3436.09 samples/sec Loss 5.8034 LearningRate 0.0343 Epoch: 14 Global Step: 74070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:36:07,596-Speed 3429.16 samples/sec Loss 5.8688 LearningRate 0.0343 Epoch: 14 Global Step: 74080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:36:10,645-Speed 3359.87 samples/sec Loss 5.6644 LearningRate 0.0343 Epoch: 14 Global Step: 74090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:36:13,626-Speed 3435.05 samples/sec Loss 5.8178 LearningRate 0.0343 Epoch: 14 Global Step: 74100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:36:16,608-Speed 3435.02 samples/sec Loss 5.6624 LearningRate 0.0343 Epoch: 14 Global Step: 74110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:36:19,587-Speed 3438.86 samples/sec Loss 5.7110 LearningRate 0.0343 Epoch: 14 Global Step: 74120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:36:22,571-Speed 3431.57 samples/sec Loss 5.6502 LearningRate 0.0342 Epoch: 14 Global Step: 74130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:36:25,585-Speed 3399.33 samples/sec Loss 5.7465 LearningRate 0.0342 Epoch: 14 Global Step: 74140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:36:28,576-Speed 3424.46 samples/sec Loss 5.7269 LearningRate 0.0342 Epoch: 14 Global Step: 74150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:36:31,568-Speed 3423.31 samples/sec Loss 5.7577 LearningRate 0.0342 Epoch: 14 Global Step: 74160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:36:34,597-Speed 3381.84 samples/sec Loss 5.7275 LearningRate 0.0342 Epoch: 14 Global Step: 74170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:36:37,599-Speed 3411.00 samples/sec Loss 5.5498 LearningRate 0.0342 Epoch: 14 Global Step: 74180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:36:40,637-Speed 3372.54 samples/sec Loss 5.7065 LearningRate 0.0342 Epoch: 14 Global Step: 74190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:36:43,660-Speed 3387.58 samples/sec Loss 5.7016 LearningRate 0.0341 Epoch: 14 Global Step: 74200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:36:46,803-Speed 3259.21 samples/sec Loss 5.7393 LearningRate 0.0341 Epoch: 14 Global Step: 74210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:36:49,894-Speed 3313.26 samples/sec Loss 5.7001 LearningRate 0.0341 Epoch: 14 Global Step: 74220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:36:52,969-Speed 3330.63 samples/sec Loss 5.7765 LearningRate 0.0341 Epoch: 14 Global Step: 74230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:36:55,979-Speed 3403.45 samples/sec Loss 5.5980 LearningRate 0.0341 Epoch: 14 Global Step: 74240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:36:58,961-Speed 3435.33 samples/sec Loss 5.7423 LearningRate 0.0341 Epoch: 14 Global Step: 74250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:01,955-Speed 3420.79 samples/sec Loss 5.7724 LearningRate 0.0341 Epoch: 14 Global Step: 74260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:04,972-Speed 3395.07 samples/sec Loss 5.6702 LearningRate 0.0341 Epoch: 14 Global Step: 74270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:07,955-Speed 3433.67 samples/sec Loss 5.7313 LearningRate 0.0340 Epoch: 14 Global Step: 74280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:10,946-Speed 3425.14 samples/sec Loss 5.5192 LearningRate 0.0340 Epoch: 14 Global Step: 74290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:13,976-Speed 3379.74 samples/sec Loss 5.7435 LearningRate 0.0340 Epoch: 14 Global Step: 74300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:16,991-Speed 3397.69 samples/sec Loss 5.9217 LearningRate 0.0340 Epoch: 14 Global Step: 74310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:20,046-Speed 3352.85 samples/sec Loss 5.5741 LearningRate 0.0340 Epoch: 14 Global Step: 74320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:23,072-Speed 3385.02 samples/sec Loss 5.7811 LearningRate 0.0340 Epoch: 14 Global Step: 74330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:26,056-Speed 3433.14 samples/sec Loss 5.6333 LearningRate 0.0340 Epoch: 14 Global Step: 74340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:29,041-Speed 3430.65 samples/sec Loss 5.7709 LearningRate 0.0340 Epoch: 14 Global Step: 74350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:37:32,155-Speed 3289.62 samples/sec Loss 5.7225 LearningRate 0.0339 Epoch: 14 Global Step: 74360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:37:35,344-Speed 3211.48 samples/sec Loss 5.7326 LearningRate 0.0339 Epoch: 14 Global Step: 74370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:37:38,360-Speed 3395.90 samples/sec Loss 5.5733 LearningRate 0.0339 Epoch: 14 Global Step: 74380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:37:41,326-Speed 3453.74 samples/sec Loss 5.7557 LearningRate 0.0339 Epoch: 14 Global Step: 74390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:44,317-Speed 3424.90 samples/sec Loss 5.5491 LearningRate 0.0339 Epoch: 14 Global Step: 74400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:47,297-Speed 3436.32 samples/sec Loss 5.8451 LearningRate 0.0339 Epoch: 14 Global Step: 74410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:50,287-Speed 3426.55 samples/sec Loss 5.6629 LearningRate 0.0339 Epoch: 14 Global Step: 74420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:53,289-Speed 3411.56 samples/sec Loss 5.6388 LearningRate 0.0338 Epoch: 14 Global Step: 74430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:56,269-Speed 3437.49 samples/sec Loss 5.6428 LearningRate 0.0338 Epoch: 14 Global Step: 74440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:37:59,260-Speed 3424.41 samples/sec Loss 5.6149 LearningRate 0.0338 Epoch: 14 Global Step: 74450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:02,268-Speed 3405.49 samples/sec Loss 5.5811 LearningRate 0.0338 Epoch: 14 Global Step: 74460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:05,361-Speed 3311.64 samples/sec Loss 5.6686 LearningRate 0.0338 Epoch: 14 Global Step: 74470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:08,343-Speed 3434.24 samples/sec Loss 5.7410 LearningRate 0.0338 Epoch: 14 Global Step: 74480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:11,412-Speed 3337.38 samples/sec Loss 5.6487 LearningRate 0.0338 Epoch: 14 Global Step: 74490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:38:14,460-Speed 3360.83 samples/sec Loss 5.5704 LearningRate 0.0338 Epoch: 14 Global Step: 74500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:17,452-Speed 3423.72 samples/sec Loss 5.5198 LearningRate 0.0337 Epoch: 14 Global Step: 74510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:20,465-Speed 3400.01 samples/sec Loss 5.6424 LearningRate 0.0337 Epoch: 14 Global Step: 74520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:23,468-Speed 3410.78 samples/sec Loss 5.6395 LearningRate 0.0337 Epoch: 14 Global Step: 74530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:26,486-Speed 3394.03 samples/sec Loss 5.6357 LearningRate 0.0337 Epoch: 14 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:29,495-Speed 3404.02 samples/sec Loss 5.5402 LearningRate 0.0337 Epoch: 14 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:32,480-Speed 3431.72 samples/sec Loss 5.8196 LearningRate 0.0337 Epoch: 14 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:35,469-Speed 3428.42 samples/sec Loss 5.6685 LearningRate 0.0337 Epoch: 14 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:38,487-Speed 3392.69 samples/sec Loss 5.6391 LearningRate 0.0337 Epoch: 14 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:41,490-Speed 3411.48 samples/sec Loss 5.7531 LearningRate 0.0336 Epoch: 14 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:44,470-Speed 3436.79 samples/sec Loss 5.6071 LearningRate 0.0336 Epoch: 14 Global Step: 74600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:38:47,485-Speed 3397.74 samples/sec Loss 5.6194 LearningRate 0.0336 Epoch: 14 Global Step: 74610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:38:50,448-Speed 3457.53 samples/sec Loss 5.5510 LearningRate 0.0336 Epoch: 14 Global Step: 74620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:53,536-Speed 3316.68 samples/sec Loss 5.5353 LearningRate 0.0336 Epoch: 14 Global Step: 74630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:56,643-Speed 3296.11 samples/sec Loss 5.5439 LearningRate 0.0336 Epoch: 14 Global Step: 74640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:38:59,630-Speed 3429.78 samples/sec Loss 5.7546 LearningRate 0.0336 Epoch: 14 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:02,629-Speed 3415.84 samples/sec Loss 5.6890 LearningRate 0.0335 Epoch: 14 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:05,647-Speed 3393.35 samples/sec Loss 5.6731 LearningRate 0.0335 Epoch: 14 Global Step: 74670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:08,635-Speed 3427.81 samples/sec Loss 5.5586 LearningRate 0.0335 Epoch: 14 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:11,624-Speed 3427.60 samples/sec Loss 5.5492 LearningRate 0.0335 Epoch: 14 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:14,628-Speed 3409.25 samples/sec Loss 5.6101 LearningRate 0.0335 Epoch: 14 Global Step: 74700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:17,740-Speed 3292.01 samples/sec Loss 5.7492 LearningRate 0.0335 Epoch: 14 Global Step: 74710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:20,719-Speed 3438.05 samples/sec Loss 5.6119 LearningRate 0.0335 Epoch: 14 Global Step: 74720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:39:23,733-Speed 3398.37 samples/sec Loss 5.6205 LearningRate 0.0335 Epoch: 14 Global Step: 74730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:39:26,702-Speed 3449.17 samples/sec Loss 5.6886 LearningRate 0.0334 Epoch: 14 Global Step: 74740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:29,702-Speed 3414.32 samples/sec Loss 5.5389 LearningRate 0.0334 Epoch: 14 Global Step: 74750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:32,691-Speed 3426.65 samples/sec Loss 5.5074 LearningRate 0.0334 Epoch: 14 Global Step: 74760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:35,784-Speed 3312.48 samples/sec Loss 5.5998 LearningRate 0.0334 Epoch: 14 Global Step: 74770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:38,910-Speed 3275.89 samples/sec Loss 5.7686 LearningRate 0.0334 Epoch: 14 Global Step: 74780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:41,937-Speed 3384.32 samples/sec Loss 5.4917 LearningRate 0.0334 Epoch: 14 Global Step: 74790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:44,952-Speed 3397.31 samples/sec Loss 5.6185 LearningRate 0.0334 Epoch: 14 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:47,939-Speed 3429.27 samples/sec Loss 5.5709 LearningRate 0.0334 Epoch: 14 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:50,924-Speed 3431.41 samples/sec Loss 5.8232 LearningRate 0.0333 Epoch: 14 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:53,904-Speed 3436.51 samples/sec Loss 5.5864 LearningRate 0.0333 Epoch: 14 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:56,867-Speed 3456.81 samples/sec Loss 5.5562 LearningRate 0.0333 Epoch: 14 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:39:59,855-Speed 3428.18 samples/sec Loss 5.5991 LearningRate 0.0333 Epoch: 14 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:40:02,855-Speed 3413.63 samples/sec Loss 5.7159 LearningRate 0.0333 Epoch: 14 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:40:05,874-Speed 3393.62 samples/sec Loss 5.5767 LearningRate 0.0333 Epoch: 14 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:40:08,861-Speed 3428.79 samples/sec Loss 5.6002 LearningRate 0.0333 Epoch: 14 Global Step: 74880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:40:11,850-Speed 3427.54 samples/sec Loss 5.4045 LearningRate 0.0333 Epoch: 14 Global Step: 74890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:40:14,830-Speed 3437.78 samples/sec Loss 5.6700 LearningRate 0.0332 Epoch: 14 Global Step: 74900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:40:17,826-Speed 3418.98 samples/sec Loss 5.5397 LearningRate 0.0332 Epoch: 14 Global Step: 74910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:40:20,829-Speed 3410.51 samples/sec Loss 5.5278 LearningRate 0.0332 Epoch: 14 Global Step: 74920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:40:23,813-Speed 3432.25 samples/sec Loss 5.7506 LearningRate 0.0332 Epoch: 14 Global Step: 74930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:40:26,802-Speed 3427.52 samples/sec Loss 5.8229 LearningRate 0.0332 Epoch: 14 Global Step: 74940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:40:29,858-Speed 3351.61 samples/sec Loss 5.4865 LearningRate 0.0332 Epoch: 14 Global Step: 74950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:40:32,886-Speed 3382.01 samples/sec Loss 5.5837 LearningRate 0.0332 Epoch: 14 Global Step: 74960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:40:35,849-Speed 3457.26 samples/sec Loss 5.7170 LearningRate 0.0331 Epoch: 14 Global Step: 74970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:40:38,892-Speed 3366.81 samples/sec Loss 5.7823 LearningRate 0.0331 Epoch: 14 Global Step: 74980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:40:41,920-Speed 3382.36 samples/sec Loss 5.6773 LearningRate 0.0331 Epoch: 14 Global Step: 74990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:40:44,938-Speed 3394.33 samples/sec Loss 5.6328 LearningRate 0.0331 Epoch: 14 Global Step: 75000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:40:47,922-Speed 3432.17 samples/sec Loss 5.6114 LearningRate 0.0331 Epoch: 14 Global Step: 75010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:40:50,940-Speed 3393.45 samples/sec Loss 5.4810 LearningRate 0.0331 Epoch: 14 Global Step: 75020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:40:53,972-Speed 3378.68 samples/sec Loss 5.6705 LearningRate 0.0331 Epoch: 14 Global Step: 75030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:40:57,002-Speed 3380.16 samples/sec Loss 5.5955 LearningRate 0.0331 Epoch: 14 Global Step: 75040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:41:00,007-Speed 3408.54 samples/sec Loss 5.5725 LearningRate 0.0330 Epoch: 14 Global Step: 75050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:41:03,000-Speed 3422.28 samples/sec Loss 5.6091 LearningRate 0.0330 Epoch: 14 Global Step: 75060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:41:05,998-Speed 3416.94 samples/sec Loss 5.6031 LearningRate 0.0330 Epoch: 14 Global Step: 75070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:08,978-Speed 3436.72 samples/sec Loss 5.5698 LearningRate 0.0330 Epoch: 14 Global Step: 75080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:11,968-Speed 3425.75 samples/sec Loss 5.5633 LearningRate 0.0330 Epoch: 14 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:14,950-Speed 3436.56 samples/sec Loss 5.6668 LearningRate 0.0330 Epoch: 14 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:17,930-Speed 3436.96 samples/sec Loss 5.6097 LearningRate 0.0330 Epoch: 14 Global Step: 75110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:20,916-Speed 3430.40 samples/sec Loss 5.5958 LearningRate 0.0330 Epoch: 14 Global Step: 75120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:23,911-Speed 3420.36 samples/sec Loss 5.6131 LearningRate 0.0329 Epoch: 14 Global Step: 75130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:26,915-Speed 3408.94 samples/sec Loss 5.6044 LearningRate 0.0329 Epoch: 14 Global Step: 75140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:29,925-Speed 3403.77 samples/sec Loss 5.6208 LearningRate 0.0329 Epoch: 14 Global Step: 75150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:32,921-Speed 3418.79 samples/sec Loss 5.5763 LearningRate 0.0329 Epoch: 14 Global Step: 75160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:35,955-Speed 3375.77 samples/sec Loss 5.7777 LearningRate 0.0329 Epoch: 14 Global Step: 75170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:41:38,924-Speed 3450.06 samples/sec Loss 5.5760 LearningRate 0.0329 Epoch: 14 Global Step: 75180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:41,983-Speed 3347.72 samples/sec Loss 5.6122 LearningRate 0.0329 Epoch: 14 Global Step: 75190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:44,969-Speed 3431.37 samples/sec Loss 5.5266 LearningRate 0.0329 Epoch: 14 Global Step: 75200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:47,975-Speed 3406.88 samples/sec Loss 5.4910 LearningRate 0.0328 Epoch: 14 Global Step: 75210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:51,046-Speed 3335.90 samples/sec Loss 5.4561 LearningRate 0.0328 Epoch: 14 Global Step: 75220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:54,052-Speed 3406.34 samples/sec Loss 5.6732 LearningRate 0.0328 Epoch: 14 Global Step: 75230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:41:57,041-Speed 3427.72 samples/sec Loss 5.5431 LearningRate 0.0328 Epoch: 14 Global Step: 75240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:00,031-Speed 3425.39 samples/sec Loss 5.5023 LearningRate 0.0328 Epoch: 14 Global Step: 75250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:03,041-Speed 3403.34 samples/sec Loss 5.7289 LearningRate 0.0328 Epoch: 14 Global Step: 75260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:06,041-Speed 3414.33 samples/sec Loss 5.6130 LearningRate 0.0328 Epoch: 14 Global Step: 75270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:09,017-Speed 3441.80 samples/sec Loss 5.6722 LearningRate 0.0328 Epoch: 14 Global Step: 75280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:11,999-Speed 3435.00 samples/sec Loss 5.6354 LearningRate 0.0327 Epoch: 14 Global Step: 75290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:14,988-Speed 3426.86 samples/sec Loss 5.6376 LearningRate 0.0327 Epoch: 14 Global Step: 75300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:17,976-Speed 3427.73 samples/sec Loss 5.7438 LearningRate 0.0327 Epoch: 14 Global Step: 75310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:20,963-Speed 3429.49 samples/sec Loss 5.6269 LearningRate 0.0327 Epoch: 14 Global Step: 75320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:23,957-Speed 3420.34 samples/sec Loss 5.6171 LearningRate 0.0327 Epoch: 14 Global Step: 75330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:26,943-Speed 3430.62 samples/sec Loss 5.7126 LearningRate 0.0327 Epoch: 14 Global Step: 75340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:29,926-Speed 3433.82 samples/sec Loss 5.6909 LearningRate 0.0327 Epoch: 14 Global Step: 75350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:32,917-Speed 3424.01 samples/sec Loss 5.6635 LearningRate 0.0326 Epoch: 14 Global Step: 75360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:35,903-Speed 3431.15 samples/sec Loss 5.7279 LearningRate 0.0326 Epoch: 14 Global Step: 75370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:38,896-Speed 3422.04 samples/sec Loss 5.5127 LearningRate 0.0326 Epoch: 14 Global Step: 75380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:42:41,890-Speed 3420.16 samples/sec Loss 5.4604 LearningRate 0.0326 Epoch: 14 Global Step: 75390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:42:44,874-Speed 3433.67 samples/sec Loss 5.6596 LearningRate 0.0326 Epoch: 14 Global Step: 75400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:47,878-Speed 3409.20 samples/sec Loss 5.6177 LearningRate 0.0326 Epoch: 14 Global Step: 75410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:50,861-Speed 3433.75 samples/sec Loss 5.5019 LearningRate 0.0326 Epoch: 14 Global Step: 75420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:53,878-Speed 3395.36 samples/sec Loss 5.6390 LearningRate 0.0326 Epoch: 14 Global Step: 75430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:56,924-Speed 3362.15 samples/sec Loss 5.6291 LearningRate 0.0325 Epoch: 14 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:42:59,980-Speed 3352.82 samples/sec Loss 5.5308 LearningRate 0.0325 Epoch: 14 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:43:02,966-Speed 3429.65 samples/sec Loss 5.7008 LearningRate 0.0325 Epoch: 14 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:43:05,949-Speed 3433.71 samples/sec Loss 5.4939 LearningRate 0.0325 Epoch: 14 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:43:08,935-Speed 3430.46 samples/sec Loss 5.5080 LearningRate 0.0325 Epoch: 14 Global Step: 75480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:43:11,917-Speed 3434.48 samples/sec Loss 5.6131 LearningRate 0.0325 Epoch: 14 Global Step: 75490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:43:14,904-Speed 3429.40 samples/sec Loss 5.6120 LearningRate 0.0325 Epoch: 14 Global Step: 75500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:43:17,864-Speed 3460.21 samples/sec Loss 5.4998 LearningRate 0.0325 Epoch: 14 Global Step: 75510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:43:20,868-Speed 3410.24 samples/sec Loss 5.6219 LearningRate 0.0324 Epoch: 14 Global Step: 75520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:43:23,943-Speed 3330.67 samples/sec Loss 5.4369 LearningRate 0.0324 Epoch: 14 Global Step: 75530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:43:26,979-Speed 3373.99 samples/sec Loss 5.5255 LearningRate 0.0324 Epoch: 14 Global Step: 75540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:43:30,020-Speed 3368.27 samples/sec Loss 5.4331 LearningRate 0.0324 Epoch: 14 Global Step: 75550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:43:33,034-Speed 3397.85 samples/sec Loss 5.5557 LearningRate 0.0324 Epoch: 14 Global Step: 75560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:43:36,031-Speed 3419.13 samples/sec Loss 5.5484 LearningRate 0.0324 Epoch: 14 Global Step: 75570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:43:39,023-Speed 3423.57 samples/sec Loss 5.6738 LearningRate 0.0324 Epoch: 14 Global Step: 75580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:43:42,056-Speed 3376.44 samples/sec Loss 5.4528 LearningRate 0.0324 Epoch: 14 Global Step: 75590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:43:45,060-Speed 3410.52 samples/sec Loss 5.5859 LearningRate 0.0323 Epoch: 14 Global Step: 75600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:43:48,055-Speed 3419.98 samples/sec Loss 5.5860 LearningRate 0.0323 Epoch: 14 Global Step: 75610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:43:51,042-Speed 3428.69 samples/sec Loss 5.6195 LearningRate 0.0323 Epoch: 14 Global Step: 75620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:43:54,068-Speed 3385.16 samples/sec Loss 5.7031 LearningRate 0.0323 Epoch: 14 Global Step: 75630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:43:57,063-Speed 3419.57 samples/sec Loss 5.5954 LearningRate 0.0323 Epoch: 14 Global Step: 75640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:00,168-Speed 3299.18 samples/sec Loss 5.5105 LearningRate 0.0323 Epoch: 14 Global Step: 75650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:03,263-Speed 3309.63 samples/sec Loss 5.3851 LearningRate 0.0323 Epoch: 14 Global Step: 75660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:06,257-Speed 3421.43 samples/sec Loss 5.6522 LearningRate 0.0323 Epoch: 14 Global Step: 75670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:09,245-Speed 3427.81 samples/sec Loss 5.6626 LearningRate 0.0322 Epoch: 14 Global Step: 75680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:12,241-Speed 3418.55 samples/sec Loss 5.6466 LearningRate 0.0322 Epoch: 14 Global Step: 75690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:15,245-Speed 3410.15 samples/sec Loss 5.6199 LearningRate 0.0322 Epoch: 14 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:18,231-Speed 3430.18 samples/sec Loss 5.5619 LearningRate 0.0322 Epoch: 14 Global Step: 75710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:44:21,206-Speed 3442.93 samples/sec Loss 5.5875 LearningRate 0.0322 Epoch: 14 Global Step: 75720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:24,193-Speed 3428.81 samples/sec Loss 5.5543 LearningRate 0.0322 Epoch: 14 Global Step: 75730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:27,191-Speed 3416.68 samples/sec Loss 5.6783 LearningRate 0.0322 Epoch: 14 Global Step: 75740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:30,183-Speed 3423.81 samples/sec Loss 5.5900 LearningRate 0.0322 Epoch: 14 Global Step: 75750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:33,174-Speed 3423.76 samples/sec Loss 5.6305 LearningRate 0.0321 Epoch: 14 Global Step: 75760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:36,195-Speed 3392.04 samples/sec Loss 5.7110 LearningRate 0.0321 Epoch: 14 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:39,227-Speed 3378.00 samples/sec Loss 5.5359 LearningRate 0.0321 Epoch: 14 Global Step: 75780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:42,218-Speed 3424.48 samples/sec Loss 5.5357 LearningRate 0.0321 Epoch: 14 Global Step: 75790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:45,205-Speed 3429.29 samples/sec Loss 5.6390 LearningRate 0.0321 Epoch: 14 Global Step: 75800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:48,204-Speed 3415.47 samples/sec Loss 5.5834 LearningRate 0.0321 Epoch: 14 Global Step: 75810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:44:51,192-Speed 3428.40 samples/sec Loss 5.6402 LearningRate 0.0321 Epoch: 14 Global Step: 75820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:44:54,252-Speed 3346.85 samples/sec Loss 5.5345 LearningRate 0.0321 Epoch: 14 Global Step: 75830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:44:57,369-Speed 3286.59 samples/sec Loss 5.4971 LearningRate 0.0320 Epoch: 14 Global Step: 75840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:45:00,353-Speed 3431.86 samples/sec Loss 5.5531 LearningRate 0.0320 Epoch: 14 Global Step: 75850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:45:03,480-Speed 3275.44 samples/sec Loss 5.4920 LearningRate 0.0320 Epoch: 14 Global Step: 75860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:45:06,465-Speed 3432.58 samples/sec Loss 5.5454 LearningRate 0.0320 Epoch: 14 Global Step: 75870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:45:22,650-Speed 632.71 samples/sec Loss 4.7489 LearningRate 0.0320 Epoch: 15 Global Step: 75880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:45:25,677-Speed 3384.90 samples/sec Loss 4.7915 LearningRate 0.0320 Epoch: 15 Global Step: 75890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:45:28,683-Speed 3407.82 samples/sec Loss 4.8391 LearningRate 0.0320 Epoch: 15 Global Step: 75900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:45:31,772-Speed 3315.58 samples/sec Loss 4.8336 LearningRate 0.0319 Epoch: 15 Global Step: 75910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:45:34,812-Speed 3369.63 samples/sec Loss 4.8223 LearningRate 0.0319 Epoch: 15 Global Step: 75920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:45:37,881-Speed 3337.03 samples/sec Loss 4.8152 LearningRate 0.0319 Epoch: 15 Global Step: 75930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:45:40,918-Speed 3372.56 samples/sec Loss 4.6823 LearningRate 0.0319 Epoch: 15 Global Step: 75940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:45:43,915-Speed 3417.69 samples/sec Loss 4.6811 LearningRate 0.0319 Epoch: 15 Global Step: 75950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:45:46,960-Speed 3364.24 samples/sec Loss 4.8396 LearningRate 0.0319 Epoch: 15 Global Step: 75960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:45:49,981-Speed 3390.15 samples/sec Loss 4.8705 LearningRate 0.0319 Epoch: 15 Global Step: 75970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:45:53,016-Speed 3375.43 samples/sec Loss 4.6254 LearningRate 0.0319 Epoch: 15 Global Step: 75980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:45:56,058-Speed 3367.20 samples/sec Loss 4.7595 LearningRate 0.0318 Epoch: 15 Global Step: 75990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:45:59,234-Speed 3224.61 samples/sec Loss 4.7303 LearningRate 0.0318 Epoch: 15 Global Step: 76000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:46:42,340-[lfw][76000]XNorm: 23.068613 Training: 2022-01-20 00:46:42,341-[lfw][76000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-01-20 00:46:42,342-[lfw][76000]Accuracy-Highest: 0.99800 Training: 2022-01-20 00:47:32,441-[cfp_fp][76000]XNorm: 20.596299 Training: 2022-01-20 00:47:32,442-[cfp_fp][76000]Accuracy-Flip: 0.97814+-0.00808 Training: 2022-01-20 00:47:32,442-[cfp_fp][76000]Accuracy-Highest: 0.97814 Training: 2022-01-20 00:48:15,571-[agedb_30][76000]XNorm: 22.613331 Training: 2022-01-20 00:48:15,571-[agedb_30][76000]Accuracy-Flip: 0.97633+-0.00865 Training: 2022-01-20 00:48:15,572-[agedb_30][76000]Accuracy-Highest: 0.97950 Training: 2022-01-20 00:48:18,552-Speed 73.50 samples/sec Loss 4.7292 LearningRate 0.0318 Epoch: 15 Global Step: 76010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:48:21,531-Speed 3437.67 samples/sec Loss 4.8204 LearningRate 0.0318 Epoch: 15 Global Step: 76020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:24,516-Speed 3431.68 samples/sec Loss 4.8865 LearningRate 0.0318 Epoch: 15 Global Step: 76030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:27,511-Speed 3420.27 samples/sec Loss 4.8281 LearningRate 0.0318 Epoch: 15 Global Step: 76040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:30,568-Speed 3350.16 samples/sec Loss 4.8519 LearningRate 0.0318 Epoch: 15 Global Step: 76050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:33,581-Speed 3399.83 samples/sec Loss 4.9406 LearningRate 0.0318 Epoch: 15 Global Step: 76060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:36,577-Speed 3418.08 samples/sec Loss 4.7583 LearningRate 0.0317 Epoch: 15 Global Step: 76070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:39,577-Speed 3414.74 samples/sec Loss 4.9361 LearningRate 0.0317 Epoch: 15 Global Step: 76080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:42,568-Speed 3425.53 samples/sec Loss 4.8890 LearningRate 0.0317 Epoch: 15 Global Step: 76090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:45,556-Speed 3428.31 samples/sec Loss 5.0172 LearningRate 0.0317 Epoch: 15 Global Step: 76100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:48,538-Speed 3435.05 samples/sec Loss 5.0876 LearningRate 0.0317 Epoch: 15 Global Step: 76110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:51,529-Speed 3424.68 samples/sec Loss 4.8815 LearningRate 0.0317 Epoch: 15 Global Step: 76120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:54,553-Speed 3386.54 samples/sec Loss 4.8525 LearningRate 0.0317 Epoch: 15 Global Step: 76130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:48:57,578-Speed 3386.59 samples/sec Loss 4.8768 LearningRate 0.0317 Epoch: 15 Global Step: 76140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:00,599-Speed 3389.75 samples/sec Loss 4.9400 LearningRate 0.0316 Epoch: 15 Global Step: 76150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:03,599-Speed 3415.09 samples/sec Loss 4.8908 LearningRate 0.0316 Epoch: 15 Global Step: 76160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:06,588-Speed 3426.33 samples/sec Loss 4.9002 LearningRate 0.0316 Epoch: 15 Global Step: 76170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:09,591-Speed 3410.24 samples/sec Loss 4.9727 LearningRate 0.0316 Epoch: 15 Global Step: 76180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:12,581-Speed 3426.44 samples/sec Loss 5.0100 LearningRate 0.0316 Epoch: 15 Global Step: 76190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:15,575-Speed 3421.42 samples/sec Loss 4.9101 LearningRate 0.0316 Epoch: 15 Global Step: 76200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:18,602-Speed 3384.51 samples/sec Loss 5.1087 LearningRate 0.0316 Epoch: 15 Global Step: 76210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:21,590-Speed 3426.65 samples/sec Loss 4.9055 LearningRate 0.0316 Epoch: 15 Global Step: 76220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:49:24,578-Speed 3428.71 samples/sec Loss 4.9466 LearningRate 0.0315 Epoch: 15 Global Step: 76230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:49:27,551-Speed 3444.85 samples/sec Loss 4.9630 LearningRate 0.0315 Epoch: 15 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:30,537-Speed 3430.66 samples/sec Loss 5.0770 LearningRate 0.0315 Epoch: 15 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:33,601-Speed 3343.19 samples/sec Loss 4.9908 LearningRate 0.0315 Epoch: 15 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:36,611-Speed 3402.32 samples/sec Loss 4.8184 LearningRate 0.0315 Epoch: 15 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:39,597-Speed 3430.18 samples/sec Loss 5.0321 LearningRate 0.0315 Epoch: 15 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:42,635-Speed 3372.27 samples/sec Loss 5.0439 LearningRate 0.0315 Epoch: 15 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:45,634-Speed 3415.17 samples/sec Loss 4.9940 LearningRate 0.0315 Epoch: 15 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:48,619-Speed 3431.67 samples/sec Loss 4.8536 LearningRate 0.0314 Epoch: 15 Global Step: 76310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:51,597-Speed 3439.14 samples/sec Loss 5.0959 LearningRate 0.0314 Epoch: 15 Global Step: 76320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:54,575-Speed 3439.12 samples/sec Loss 5.0904 LearningRate 0.0314 Epoch: 15 Global Step: 76330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:49:57,611-Speed 3374.55 samples/sec Loss 5.0353 LearningRate 0.0314 Epoch: 15 Global Step: 76340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:50:00,616-Speed 3407.86 samples/sec Loss 4.9563 LearningRate 0.0314 Epoch: 15 Global Step: 76350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:50:03,600-Speed 3432.98 samples/sec Loss 5.1644 LearningRate 0.0314 Epoch: 15 Global Step: 76360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:50:06,605-Speed 3408.93 samples/sec Loss 4.9137 LearningRate 0.0314 Epoch: 15 Global Step: 76370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:50:09,599-Speed 3421.32 samples/sec Loss 5.0616 LearningRate 0.0314 Epoch: 15 Global Step: 76380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:50:12,596-Speed 3418.16 samples/sec Loss 4.9166 LearningRate 0.0313 Epoch: 15 Global Step: 76390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:50:15,581-Speed 3431.10 samples/sec Loss 5.0599 LearningRate 0.0313 Epoch: 15 Global Step: 76400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:50:18,559-Speed 3440.27 samples/sec Loss 5.1035 LearningRate 0.0313 Epoch: 15 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:50:21,563-Speed 3408.73 samples/sec Loss 4.9813 LearningRate 0.0313 Epoch: 15 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:50:24,604-Speed 3368.06 samples/sec Loss 5.1417 LearningRate 0.0313 Epoch: 15 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:50:27,613-Speed 3404.75 samples/sec Loss 5.0042 LearningRate 0.0313 Epoch: 15 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:50:30,616-Speed 3411.24 samples/sec Loss 5.2028 LearningRate 0.0313 Epoch: 15 Global Step: 76450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:50:33,585-Speed 3450.72 samples/sec Loss 4.9929 LearningRate 0.0313 Epoch: 15 Global Step: 76460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:50:36,553-Speed 3450.64 samples/sec Loss 5.1479 LearningRate 0.0312 Epoch: 15 Global Step: 76470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:50:39,534-Speed 3435.26 samples/sec Loss 5.2147 LearningRate 0.0312 Epoch: 15 Global Step: 76480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:50:42,523-Speed 3426.84 samples/sec Loss 5.0679 LearningRate 0.0312 Epoch: 15 Global Step: 76490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:50:45,503-Speed 3437.23 samples/sec Loss 5.1567 LearningRate 0.0312 Epoch: 15 Global Step: 76500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:50:48,513-Speed 3403.78 samples/sec Loss 5.1239 LearningRate 0.0312 Epoch: 15 Global Step: 76510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:50:51,490-Speed 3440.15 samples/sec Loss 5.1304 LearningRate 0.0312 Epoch: 15 Global Step: 76520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:50:54,475-Speed 3431.03 samples/sec Loss 5.1450 LearningRate 0.0312 Epoch: 15 Global Step: 76530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:50:57,457-Speed 3434.86 samples/sec Loss 5.0492 LearningRate 0.0312 Epoch: 15 Global Step: 76540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:51:00,440-Speed 3434.65 samples/sec Loss 5.0992 LearningRate 0.0311 Epoch: 15 Global Step: 76550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:51:03,423-Speed 3434.12 samples/sec Loss 5.0590 LearningRate 0.0311 Epoch: 15 Global Step: 76560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:51:06,402-Speed 3437.23 samples/sec Loss 5.0473 LearningRate 0.0311 Epoch: 15 Global Step: 76570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-20 00:51:09,382-Speed 3436.95 samples/sec Loss 5.1685 LearningRate 0.0311 Epoch: 15 Global Step: 76580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:51:12,366-Speed 3433.82 samples/sec Loss 5.1331 LearningRate 0.0311 Epoch: 15 Global Step: 76590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:51:15,344-Speed 3438.84 samples/sec Loss 5.1212 LearningRate 0.0311 Epoch: 15 Global Step: 76600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:51:18,356-Speed 3400.56 samples/sec Loss 5.0986 LearningRate 0.0311 Epoch: 15 Global Step: 76610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:51:21,383-Speed 3383.95 samples/sec Loss 5.2254 LearningRate 0.0311 Epoch: 15 Global Step: 76620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:51:24,415-Speed 3377.98 samples/sec Loss 5.1000 LearningRate 0.0310 Epoch: 15 Global Step: 76630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:51:27,443-Speed 3383.74 samples/sec Loss 5.0784 LearningRate 0.0310 Epoch: 15 Global Step: 76640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:51:30,426-Speed 3433.56 samples/sec Loss 5.1495 LearningRate 0.0310 Epoch: 15 Global Step: 76650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:51:33,411-Speed 3431.64 samples/sec Loss 5.1914 LearningRate 0.0310 Epoch: 15 Global Step: 76660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:51:36,402-Speed 3424.26 samples/sec Loss 5.0210 LearningRate 0.0310 Epoch: 15 Global Step: 76670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:51:39,488-Speed 3318.27 samples/sec Loss 5.1571 LearningRate 0.0310 Epoch: 15 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:51:42,543-Speed 3352.92 samples/sec Loss 5.1328 LearningRate 0.0310 Epoch: 15 Global Step: 76690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:51:45,540-Speed 3417.89 samples/sec Loss 5.2011 LearningRate 0.0310 Epoch: 15 Global Step: 76700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:51:48,519-Speed 3439.88 samples/sec Loss 5.1472 LearningRate 0.0309 Epoch: 15 Global Step: 76710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:51:51,514-Speed 3418.93 samples/sec Loss 5.1626 LearningRate 0.0309 Epoch: 15 Global Step: 76720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:51:54,505-Speed 3425.19 samples/sec Loss 5.1181 LearningRate 0.0309 Epoch: 15 Global Step: 76730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:51:57,490-Speed 3432.34 samples/sec Loss 5.1827 LearningRate 0.0309 Epoch: 15 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:00,500-Speed 3402.16 samples/sec Loss 5.1504 LearningRate 0.0309 Epoch: 15 Global Step: 76750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:03,483-Speed 3434.57 samples/sec Loss 5.3142 LearningRate 0.0309 Epoch: 15 Global Step: 76760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:06,496-Speed 3398.81 samples/sec Loss 5.1577 LearningRate 0.0309 Epoch: 15 Global Step: 76770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:09,509-Speed 3399.09 samples/sec Loss 5.1788 LearningRate 0.0309 Epoch: 15 Global Step: 76780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:52:12,487-Speed 3440.64 samples/sec Loss 5.1804 LearningRate 0.0308 Epoch: 15 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:15,496-Speed 3402.84 samples/sec Loss 5.2063 LearningRate 0.0308 Epoch: 15 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:18,538-Speed 3367.52 samples/sec Loss 5.1806 LearningRate 0.0308 Epoch: 15 Global Step: 76810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:21,524-Speed 3430.78 samples/sec Loss 5.3017 LearningRate 0.0308 Epoch: 15 Global Step: 76820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:24,540-Speed 3395.76 samples/sec Loss 5.0489 LearningRate 0.0308 Epoch: 15 Global Step: 76830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:27,579-Speed 3371.27 samples/sec Loss 5.1555 LearningRate 0.0308 Epoch: 15 Global Step: 76840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:30,560-Speed 3435.30 samples/sec Loss 5.2138 LearningRate 0.0308 Epoch: 15 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:33,548-Speed 3428.10 samples/sec Loss 5.2527 LearningRate 0.0308 Epoch: 15 Global Step: 76860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:36,595-Speed 3361.98 samples/sec Loss 5.1623 LearningRate 0.0307 Epoch: 15 Global Step: 76870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:39,768-Speed 3227.17 samples/sec Loss 5.2100 LearningRate 0.0307 Epoch: 15 Global Step: 76880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:42,827-Speed 3348.41 samples/sec Loss 5.1972 LearningRate 0.0307 Epoch: 15 Global Step: 76890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:45,822-Speed 3420.04 samples/sec Loss 5.2974 LearningRate 0.0307 Epoch: 15 Global Step: 76900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:48,807-Speed 3433.02 samples/sec Loss 5.2306 LearningRate 0.0307 Epoch: 15 Global Step: 76910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:51,827-Speed 3391.02 samples/sec Loss 5.3109 LearningRate 0.0307 Epoch: 15 Global Step: 76920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:52:54,801-Speed 3443.78 samples/sec Loss 5.0876 LearningRate 0.0307 Epoch: 15 Global Step: 76930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:52:57,782-Speed 3436.45 samples/sec Loss 5.2622 LearningRate 0.0307 Epoch: 15 Global Step: 76940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:53:00,763-Speed 3435.73 samples/sec Loss 5.2201 LearningRate 0.0306 Epoch: 15 Global Step: 76950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:53:03,744-Speed 3437.00 samples/sec Loss 5.0898 LearningRate 0.0306 Epoch: 15 Global Step: 76960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:53:06,720-Speed 3441.55 samples/sec Loss 5.2186 LearningRate 0.0306 Epoch: 15 Global Step: 76970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:53:09,709-Speed 3426.07 samples/sec Loss 5.1647 LearningRate 0.0306 Epoch: 15 Global Step: 76980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:53:12,687-Speed 3440.05 samples/sec Loss 5.2495 LearningRate 0.0306 Epoch: 15 Global Step: 76990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:53:15,671-Speed 3433.07 samples/sec Loss 5.2450 LearningRate 0.0306 Epoch: 15 Global Step: 77000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:53:18,704-Speed 3377.20 samples/sec Loss 5.2391 LearningRate 0.0306 Epoch: 15 Global Step: 77010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:53:21,693-Speed 3426.61 samples/sec Loss 5.2528 LearningRate 0.0306 Epoch: 15 Global Step: 77020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:53:24,748-Speed 3352.84 samples/sec Loss 5.2295 LearningRate 0.0305 Epoch: 15 Global Step: 77030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:53:27,744-Speed 3419.29 samples/sec Loss 5.2807 LearningRate 0.0305 Epoch: 15 Global Step: 77040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:53:30,746-Speed 3410.90 samples/sec Loss 5.4312 LearningRate 0.0305 Epoch: 15 Global Step: 77050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:53:33,736-Speed 3426.26 samples/sec Loss 5.2474 LearningRate 0.0305 Epoch: 15 Global Step: 77060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:53:36,719-Speed 3433.45 samples/sec Loss 5.3031 LearningRate 0.0305 Epoch: 15 Global Step: 77070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:53:39,715-Speed 3418.95 samples/sec Loss 5.1592 LearningRate 0.0305 Epoch: 15 Global Step: 77080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:53:42,758-Speed 3366.49 samples/sec Loss 5.2312 LearningRate 0.0305 Epoch: 15 Global Step: 77090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:53:45,786-Speed 3382.52 samples/sec Loss 5.1365 LearningRate 0.0305 Epoch: 15 Global Step: 77100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:53:49,053-Speed 3134.89 samples/sec Loss 5.2636 LearningRate 0.0305 Epoch: 15 Global Step: 77110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:53:52,032-Speed 3438.38 samples/sec Loss 5.2500 LearningRate 0.0304 Epoch: 15 Global Step: 77120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:53:55,017-Speed 3431.35 samples/sec Loss 5.1551 LearningRate 0.0304 Epoch: 15 Global Step: 77130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:53:58,034-Speed 3395.27 samples/sec Loss 5.1192 LearningRate 0.0304 Epoch: 15 Global Step: 77140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:01,032-Speed 3416.64 samples/sec Loss 5.2621 LearningRate 0.0304 Epoch: 15 Global Step: 77150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:04,024-Speed 3424.16 samples/sec Loss 5.1986 LearningRate 0.0304 Epoch: 15 Global Step: 77160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:07,099-Speed 3330.87 samples/sec Loss 5.2787 LearningRate 0.0304 Epoch: 15 Global Step: 77170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:10,081-Speed 3434.49 samples/sec Loss 5.2384 LearningRate 0.0304 Epoch: 15 Global Step: 77180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:13,086-Speed 3409.50 samples/sec Loss 5.1433 LearningRate 0.0304 Epoch: 15 Global Step: 77190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:16,103-Speed 3394.73 samples/sec Loss 5.2123 LearningRate 0.0303 Epoch: 15 Global Step: 77200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:19,137-Speed 3376.64 samples/sec Loss 5.2354 LearningRate 0.0303 Epoch: 15 Global Step: 77210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:22,159-Speed 3389.20 samples/sec Loss 5.2724 LearningRate 0.0303 Epoch: 15 Global Step: 77220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:25,143-Speed 3431.74 samples/sec Loss 5.3261 LearningRate 0.0303 Epoch: 15 Global Step: 77230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:28,128-Speed 3431.16 samples/sec Loss 5.2409 LearningRate 0.0303 Epoch: 15 Global Step: 77240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:54:31,129-Speed 3413.75 samples/sec Loss 5.2390 LearningRate 0.0303 Epoch: 15 Global Step: 77250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:54:34,119-Speed 3425.74 samples/sec Loss 5.3326 LearningRate 0.0303 Epoch: 15 Global Step: 77260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:54:37,109-Speed 3425.57 samples/sec Loss 5.2143 LearningRate 0.0303 Epoch: 15 Global Step: 77270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:54:40,099-Speed 3426.04 samples/sec Loss 5.1593 LearningRate 0.0302 Epoch: 15 Global Step: 77280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:54:43,085-Speed 3430.27 samples/sec Loss 5.1584 LearningRate 0.0302 Epoch: 15 Global Step: 77290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:54:46,067-Speed 3434.73 samples/sec Loss 5.3909 LearningRate 0.0302 Epoch: 15 Global Step: 77300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:54:49,048-Speed 3435.89 samples/sec Loss 5.1998 LearningRate 0.0302 Epoch: 15 Global Step: 77310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:54:52,032-Speed 3432.81 samples/sec Loss 5.2052 LearningRate 0.0302 Epoch: 15 Global Step: 77320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:55,018-Speed 3430.60 samples/sec Loss 5.3044 LearningRate 0.0302 Epoch: 15 Global Step: 77330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:54:58,016-Speed 3415.78 samples/sec Loss 5.2578 LearningRate 0.0302 Epoch: 15 Global Step: 77340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:01,019-Speed 3411.14 samples/sec Loss 5.3755 LearningRate 0.0302 Epoch: 15 Global Step: 77350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:04,039-Speed 3392.27 samples/sec Loss 5.2579 LearningRate 0.0301 Epoch: 15 Global Step: 77360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:07,020-Speed 3434.82 samples/sec Loss 5.1091 LearningRate 0.0301 Epoch: 15 Global Step: 77370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:10,025-Speed 3409.66 samples/sec Loss 5.2596 LearningRate 0.0301 Epoch: 15 Global Step: 77380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:13,058-Speed 3376.57 samples/sec Loss 5.2451 LearningRate 0.0301 Epoch: 15 Global Step: 77390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:16,047-Speed 3426.53 samples/sec Loss 5.1632 LearningRate 0.0301 Epoch: 15 Global Step: 77400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:19,034-Speed 3429.98 samples/sec Loss 5.3084 LearningRate 0.0301 Epoch: 15 Global Step: 77410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:22,023-Speed 3426.28 samples/sec Loss 5.2117 LearningRate 0.0301 Epoch: 15 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:55:25,005-Speed 3434.69 samples/sec Loss 5.2516 LearningRate 0.0301 Epoch: 15 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:55:27,977-Speed 3446.84 samples/sec Loss 5.3042 LearningRate 0.0300 Epoch: 15 Global Step: 77440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:31,058-Speed 3324.31 samples/sec Loss 5.2159 LearningRate 0.0300 Epoch: 15 Global Step: 77450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:34,212-Speed 3247.83 samples/sec Loss 5.2548 LearningRate 0.0300 Epoch: 15 Global Step: 77460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:37,238-Speed 3385.62 samples/sec Loss 5.0700 LearningRate 0.0300 Epoch: 15 Global Step: 77470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:40,218-Speed 3436.58 samples/sec Loss 5.3324 LearningRate 0.0300 Epoch: 15 Global Step: 77480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:43,200-Speed 3435.96 samples/sec Loss 5.2760 LearningRate 0.0300 Epoch: 15 Global Step: 77490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:46,199-Speed 3415.10 samples/sec Loss 5.2080 LearningRate 0.0300 Epoch: 15 Global Step: 77500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:49,199-Speed 3414.67 samples/sec Loss 5.3057 LearningRate 0.0300 Epoch: 15 Global Step: 77510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:52,185-Speed 3429.85 samples/sec Loss 5.3203 LearningRate 0.0299 Epoch: 15 Global Step: 77520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:55,205-Speed 3391.07 samples/sec Loss 5.3171 LearningRate 0.0299 Epoch: 15 Global Step: 77530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:55:58,192-Speed 3429.46 samples/sec Loss 5.1948 LearningRate 0.0299 Epoch: 15 Global Step: 77540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:56:01,174-Speed 3434.60 samples/sec Loss 5.1432 LearningRate 0.0299 Epoch: 15 Global Step: 77550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:56:04,155-Speed 3436.44 samples/sec Loss 5.2323 LearningRate 0.0299 Epoch: 15 Global Step: 77560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:56:07,197-Speed 3367.96 samples/sec Loss 5.2863 LearningRate 0.0299 Epoch: 15 Global Step: 77570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:56:10,326-Speed 3272.97 samples/sec Loss 5.2574 LearningRate 0.0299 Epoch: 15 Global Step: 77580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:56:13,373-Speed 3362.17 samples/sec Loss 5.1816 LearningRate 0.0299 Epoch: 15 Global Step: 77590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:56:16,357-Speed 3431.70 samples/sec Loss 5.2251 LearningRate 0.0298 Epoch: 15 Global Step: 77600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:56:19,339-Speed 3434.69 samples/sec Loss 5.2115 LearningRate 0.0298 Epoch: 15 Global Step: 77610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:56:22,329-Speed 3425.56 samples/sec Loss 5.2874 LearningRate 0.0298 Epoch: 15 Global Step: 77620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:56:25,313-Speed 3432.63 samples/sec Loss 5.2999 LearningRate 0.0298 Epoch: 15 Global Step: 77630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:56:28,331-Speed 3393.86 samples/sec Loss 5.2716 LearningRate 0.0298 Epoch: 15 Global Step: 77640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:56:31,390-Speed 3348.94 samples/sec Loss 5.2306 LearningRate 0.0298 Epoch: 15 Global Step: 77650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:56:34,411-Speed 3391.25 samples/sec Loss 5.3449 LearningRate 0.0298 Epoch: 15 Global Step: 77660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:56:37,410-Speed 3415.14 samples/sec Loss 5.2497 LearningRate 0.0298 Epoch: 15 Global Step: 77670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:56:40,404-Speed 3420.54 samples/sec Loss 5.2648 LearningRate 0.0298 Epoch: 15 Global Step: 77680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:56:43,431-Speed 3384.89 samples/sec Loss 5.1290 LearningRate 0.0297 Epoch: 15 Global Step: 77690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:56:46,465-Speed 3374.99 samples/sec Loss 5.2092 LearningRate 0.0297 Epoch: 15 Global Step: 77700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:56:49,551-Speed 3319.11 samples/sec Loss 5.1964 LearningRate 0.0297 Epoch: 15 Global Step: 77710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:56:52,531-Speed 3437.47 samples/sec Loss 5.2331 LearningRate 0.0297 Epoch: 15 Global Step: 77720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:56:55,528-Speed 3418.14 samples/sec Loss 5.1611 LearningRate 0.0297 Epoch: 15 Global Step: 77730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:56:58,521-Speed 3421.78 samples/sec Loss 5.2513 LearningRate 0.0297 Epoch: 15 Global Step: 77740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:57:01,501-Speed 3438.27 samples/sec Loss 5.2832 LearningRate 0.0297 Epoch: 15 Global Step: 77750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:57:04,485-Speed 3431.60 samples/sec Loss 5.3306 LearningRate 0.0297 Epoch: 15 Global Step: 77760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:57:07,493-Speed 3405.44 samples/sec Loss 5.3256 LearningRate 0.0296 Epoch: 15 Global Step: 77770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:57:10,473-Speed 3436.81 samples/sec Loss 5.3385 LearningRate 0.0296 Epoch: 15 Global Step: 77780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:57:13,454-Speed 3437.33 samples/sec Loss 5.1112 LearningRate 0.0296 Epoch: 15 Global Step: 77790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 00:57:16,464-Speed 3402.66 samples/sec Loss 5.2106 LearningRate 0.0296 Epoch: 15 Global Step: 77800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:57:19,505-Speed 3368.01 samples/sec Loss 5.0829 LearningRate 0.0296 Epoch: 15 Global Step: 77810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:57:22,660-Speed 3245.92 samples/sec Loss 5.1367 LearningRate 0.0296 Epoch: 15 Global Step: 77820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:57:25,819-Speed 3242.11 samples/sec Loss 5.3830 LearningRate 0.0296 Epoch: 15 Global Step: 77830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:57:28,903-Speed 3322.13 samples/sec Loss 5.3352 LearningRate 0.0296 Epoch: 15 Global Step: 77840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:57:31,890-Speed 3429.59 samples/sec Loss 5.3747 LearningRate 0.0295 Epoch: 15 Global Step: 77850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:57:34,883-Speed 3422.06 samples/sec Loss 5.2241 LearningRate 0.0295 Epoch: 15 Global Step: 77860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:57:37,867-Speed 3432.41 samples/sec Loss 5.2282 LearningRate 0.0295 Epoch: 15 Global Step: 77870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:57:40,873-Speed 3407.71 samples/sec Loss 5.1482 LearningRate 0.0295 Epoch: 15 Global Step: 77880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:57:43,862-Speed 3426.61 samples/sec Loss 5.1748 LearningRate 0.0295 Epoch: 15 Global Step: 77890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:57:46,886-Speed 3387.40 samples/sec Loss 5.2367 LearningRate 0.0295 Epoch: 15 Global Step: 77900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:57:49,870-Speed 3431.83 samples/sec Loss 5.1814 LearningRate 0.0295 Epoch: 15 Global Step: 77910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:57:52,944-Speed 3332.91 samples/sec Loss 5.1716 LearningRate 0.0295 Epoch: 15 Global Step: 77920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:57:55,936-Speed 3422.62 samples/sec Loss 5.1888 LearningRate 0.0294 Epoch: 15 Global Step: 77930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:57:58,987-Speed 3357.47 samples/sec Loss 5.2556 LearningRate 0.0294 Epoch: 15 Global Step: 77940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:58:01,985-Speed 3417.27 samples/sec Loss 5.2972 LearningRate 0.0294 Epoch: 15 Global Step: 77950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:58:04,984-Speed 3414.77 samples/sec Loss 5.3174 LearningRate 0.0294 Epoch: 15 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:58:08,060-Speed 3330.16 samples/sec Loss 5.3121 LearningRate 0.0294 Epoch: 15 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:58:11,044-Speed 3432.54 samples/sec Loss 5.2306 LearningRate 0.0294 Epoch: 15 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 00:58:14,021-Speed 3441.11 samples/sec Loss 5.2702 LearningRate 0.0294 Epoch: 15 Global Step: 77990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:58:17,014-Speed 3422.04 samples/sec Loss 5.2956 LearningRate 0.0294 Epoch: 15 Global Step: 78000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 00:59:00,302-[lfw][78000]XNorm: 22.011551 Training: 2022-01-20 00:59:00,303-[lfw][78000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-01-20 00:59:00,303-[lfw][78000]Accuracy-Highest: 0.99817 Training: 2022-01-20 00:59:50,641-[cfp_fp][78000]XNorm: 19.936603 Training: 2022-01-20 00:59:50,642-[cfp_fp][78000]Accuracy-Flip: 0.97514+-0.00627 Training: 2022-01-20 00:59:50,643-[cfp_fp][78000]Accuracy-Highest: 0.97814 Training: 2022-01-20 01:00:34,258-[agedb_30][78000]XNorm: 21.843680 Training: 2022-01-20 01:00:34,259-[agedb_30][78000]Accuracy-Flip: 0.98100+-0.00655 Training: 2022-01-20 01:00:34,260-[agedb_30][78000]Accuracy-Highest: 0.98100 Training: 2022-01-20 01:00:37,238-Speed 73.03 samples/sec Loss 5.2776 LearningRate 0.0293 Epoch: 15 Global Step: 78010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:00:40,206-Speed 3450.52 samples/sec Loss 5.2231 LearningRate 0.0293 Epoch: 15 Global Step: 78020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:00:43,177-Speed 3447.07 samples/sec Loss 5.1896 LearningRate 0.0293 Epoch: 15 Global Step: 78030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:00:46,145-Speed 3451.54 samples/sec Loss 5.1894 LearningRate 0.0293 Epoch: 15 Global Step: 78040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:00:49,193-Speed 3360.38 samples/sec Loss 5.4048 LearningRate 0.0293 Epoch: 15 Global Step: 78050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:00:52,173-Speed 3437.70 samples/sec Loss 5.3614 LearningRate 0.0293 Epoch: 15 Global Step: 78060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:00:55,150-Speed 3439.97 samples/sec Loss 5.3262 LearningRate 0.0293 Epoch: 15 Global Step: 78070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:00:58,297-Speed 3254.92 samples/sec Loss 5.2199 LearningRate 0.0293 Epoch: 15 Global Step: 78080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:01:01,316-Speed 3392.90 samples/sec Loss 5.2059 LearningRate 0.0293 Epoch: 15 Global Step: 78090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:01:04,283-Speed 3452.14 samples/sec Loss 5.2783 LearningRate 0.0292 Epoch: 15 Global Step: 78100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:01:07,261-Speed 3439.67 samples/sec Loss 5.1935 LearningRate 0.0292 Epoch: 15 Global Step: 78110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:01:10,243-Speed 3435.37 samples/sec Loss 4.9704 LearningRate 0.0292 Epoch: 15 Global Step: 78120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:01:13,222-Speed 3437.34 samples/sec Loss 5.3807 LearningRate 0.0292 Epoch: 15 Global Step: 78130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:01:16,212-Speed 3426.74 samples/sec Loss 5.3523 LearningRate 0.0292 Epoch: 15 Global Step: 78140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:01:19,198-Speed 3429.32 samples/sec Loss 5.2673 LearningRate 0.0292 Epoch: 15 Global Step: 78150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:01:22,178-Speed 3438.52 samples/sec Loss 5.3271 LearningRate 0.0292 Epoch: 15 Global Step: 78160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:01:25,163-Speed 3431.16 samples/sec Loss 5.2455 LearningRate 0.0292 Epoch: 15 Global Step: 78170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:01:28,181-Speed 3393.79 samples/sec Loss 5.3633 LearningRate 0.0291 Epoch: 15 Global Step: 78180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:01:31,168-Speed 3428.88 samples/sec Loss 5.2342 LearningRate 0.0291 Epoch: 15 Global Step: 78190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:01:34,152-Speed 3432.11 samples/sec Loss 5.2654 LearningRate 0.0291 Epoch: 15 Global Step: 78200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:01:37,218-Speed 3341.55 samples/sec Loss 5.3311 LearningRate 0.0291 Epoch: 15 Global Step: 78210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:01:40,262-Speed 3363.87 samples/sec Loss 5.2903 LearningRate 0.0291 Epoch: 15 Global Step: 78220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:01:43,258-Speed 3419.79 samples/sec Loss 5.4673 LearningRate 0.0291 Epoch: 15 Global Step: 78230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:01:46,238-Speed 3436.85 samples/sec Loss 5.3106 LearningRate 0.0291 Epoch: 15 Global Step: 78240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:01:49,248-Speed 3402.82 samples/sec Loss 5.2702 LearningRate 0.0291 Epoch: 15 Global Step: 78250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:01:52,232-Speed 3432.63 samples/sec Loss 5.1696 LearningRate 0.0290 Epoch: 15 Global Step: 78260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:01:55,245-Speed 3399.03 samples/sec Loss 5.3488 LearningRate 0.0290 Epoch: 15 Global Step: 78270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:01:58,297-Speed 3356.18 samples/sec Loss 5.2727 LearningRate 0.0290 Epoch: 15 Global Step: 78280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:02:01,447-Speed 3252.08 samples/sec Loss 5.1205 LearningRate 0.0290 Epoch: 15 Global Step: 78290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:02:04,412-Speed 3453.75 samples/sec Loss 5.3699 LearningRate 0.0290 Epoch: 15 Global Step: 78300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:02:07,406-Speed 3421.07 samples/sec Loss 5.3820 LearningRate 0.0290 Epoch: 15 Global Step: 78310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:02:10,385-Speed 3439.20 samples/sec Loss 5.1254 LearningRate 0.0290 Epoch: 15 Global Step: 78320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:02:13,366-Speed 3436.51 samples/sec Loss 5.3262 LearningRate 0.0290 Epoch: 15 Global Step: 78330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:02:16,353-Speed 3428.26 samples/sec Loss 5.3411 LearningRate 0.0290 Epoch: 15 Global Step: 78340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:02:19,348-Speed 3420.80 samples/sec Loss 5.2472 LearningRate 0.0289 Epoch: 15 Global Step: 78350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:02:22,330-Speed 3434.04 samples/sec Loss 5.2727 LearningRate 0.0289 Epoch: 15 Global Step: 78360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:02:25,390-Speed 3347.24 samples/sec Loss 5.2598 LearningRate 0.0289 Epoch: 15 Global Step: 78370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:02:28,354-Speed 3455.33 samples/sec Loss 5.1160 LearningRate 0.0289 Epoch: 15 Global Step: 78380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:02:31,385-Speed 3380.07 samples/sec Loss 5.4427 LearningRate 0.0289 Epoch: 15 Global Step: 78390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:02:34,467-Speed 3323.49 samples/sec Loss 5.4395 LearningRate 0.0289 Epoch: 15 Global Step: 78400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:02:37,608-Speed 3260.92 samples/sec Loss 5.1744 LearningRate 0.0289 Epoch: 15 Global Step: 78410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:02:40,694-Speed 3318.93 samples/sec Loss 5.1325 LearningRate 0.0289 Epoch: 15 Global Step: 78420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:02:43,696-Speed 3411.91 samples/sec Loss 5.1450 LearningRate 0.0288 Epoch: 15 Global Step: 78430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:02:46,678-Speed 3435.55 samples/sec Loss 5.2949 LearningRate 0.0288 Epoch: 15 Global Step: 78440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:02:49,677-Speed 3415.68 samples/sec Loss 5.1846 LearningRate 0.0288 Epoch: 15 Global Step: 78450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:02:52,667-Speed 3425.23 samples/sec Loss 5.2266 LearningRate 0.0288 Epoch: 15 Global Step: 78460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:02:55,675-Speed 3405.11 samples/sec Loss 5.1820 LearningRate 0.0288 Epoch: 15 Global Step: 78470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:02:58,654-Speed 3438.59 samples/sec Loss 5.3249 LearningRate 0.0288 Epoch: 15 Global Step: 78480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:01,629-Speed 3442.73 samples/sec Loss 5.1864 LearningRate 0.0288 Epoch: 15 Global Step: 78490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:04,605-Speed 3441.89 samples/sec Loss 5.2415 LearningRate 0.0288 Epoch: 15 Global Step: 78500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:07,585-Speed 3437.78 samples/sec Loss 5.1222 LearningRate 0.0287 Epoch: 15 Global Step: 78510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:10,599-Speed 3398.28 samples/sec Loss 5.2360 LearningRate 0.0287 Epoch: 15 Global Step: 78520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:13,589-Speed 3425.51 samples/sec Loss 5.1900 LearningRate 0.0287 Epoch: 15 Global Step: 78530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:16,629-Speed 3368.94 samples/sec Loss 5.2112 LearningRate 0.0287 Epoch: 15 Global Step: 78540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:19,645-Speed 3396.60 samples/sec Loss 5.0835 LearningRate 0.0287 Epoch: 15 Global Step: 78550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:22,625-Speed 3436.28 samples/sec Loss 5.2692 LearningRate 0.0287 Epoch: 15 Global Step: 78560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:25,602-Speed 3440.64 samples/sec Loss 5.3341 LearningRate 0.0287 Epoch: 15 Global Step: 78570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:28,649-Speed 3361.85 samples/sec Loss 5.0089 LearningRate 0.0287 Epoch: 15 Global Step: 78580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:03:31,622-Speed 3444.81 samples/sec Loss 5.0651 LearningRate 0.0287 Epoch: 15 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:34,657-Speed 3375.48 samples/sec Loss 5.1004 LearningRate 0.0286 Epoch: 15 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:37,648-Speed 3424.17 samples/sec Loss 5.1155 LearningRate 0.0286 Epoch: 15 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:40,655-Speed 3407.15 samples/sec Loss 5.2559 LearningRate 0.0286 Epoch: 15 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:03:43,649-Speed 3420.91 samples/sec Loss 5.2559 LearningRate 0.0286 Epoch: 15 Global Step: 78630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:03:46,650-Speed 3412.67 samples/sec Loss 5.1189 LearningRate 0.0286 Epoch: 15 Global Step: 78640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:03:49,691-Speed 3368.29 samples/sec Loss 5.1540 LearningRate 0.0286 Epoch: 15 Global Step: 78650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:03:52,713-Speed 3389.17 samples/sec Loss 5.3080 LearningRate 0.0286 Epoch: 15 Global Step: 78660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:03:55,712-Speed 3415.16 samples/sec Loss 5.2167 LearningRate 0.0286 Epoch: 15 Global Step: 78670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:03:58,707-Speed 3419.93 samples/sec Loss 5.3092 LearningRate 0.0285 Epoch: 15 Global Step: 78680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:04:01,693-Speed 3430.55 samples/sec Loss 5.1651 LearningRate 0.0285 Epoch: 15 Global Step: 78690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:04:04,697-Speed 3409.81 samples/sec Loss 5.2350 LearningRate 0.0285 Epoch: 15 Global Step: 78700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:04:07,677-Speed 3436.95 samples/sec Loss 5.2589 LearningRate 0.0285 Epoch: 15 Global Step: 78710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:04:10,657-Speed 3438.20 samples/sec Loss 5.2230 LearningRate 0.0285 Epoch: 15 Global Step: 78720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:04:13,646-Speed 3425.90 samples/sec Loss 5.2838 LearningRate 0.0285 Epoch: 15 Global Step: 78730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:16,630-Speed 3433.58 samples/sec Loss 5.2016 LearningRate 0.0285 Epoch: 15 Global Step: 78740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:19,619-Speed 3425.78 samples/sec Loss 5.1020 LearningRate 0.0285 Epoch: 15 Global Step: 78750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:22,603-Speed 3432.94 samples/sec Loss 5.1839 LearningRate 0.0284 Epoch: 15 Global Step: 78760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:25,589-Speed 3430.65 samples/sec Loss 5.1033 LearningRate 0.0284 Epoch: 15 Global Step: 78770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:28,634-Speed 3363.35 samples/sec Loss 5.2317 LearningRate 0.0284 Epoch: 15 Global Step: 78780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:31,650-Speed 3396.34 samples/sec Loss 5.2102 LearningRate 0.0284 Epoch: 15 Global Step: 78790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:34,654-Speed 3409.32 samples/sec Loss 5.2651 LearningRate 0.0284 Epoch: 15 Global Step: 78800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:37,641-Speed 3429.58 samples/sec Loss 5.1662 LearningRate 0.0284 Epoch: 15 Global Step: 78810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:40,672-Speed 3378.60 samples/sec Loss 5.1652 LearningRate 0.0284 Epoch: 15 Global Step: 78820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:43,652-Speed 3436.99 samples/sec Loss 5.1212 LearningRate 0.0284 Epoch: 15 Global Step: 78830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:04:46,635-Speed 3434.63 samples/sec Loss 5.1979 LearningRate 0.0284 Epoch: 15 Global Step: 78840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:49,658-Speed 3387.63 samples/sec Loss 5.0367 LearningRate 0.0283 Epoch: 15 Global Step: 78850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:52,701-Speed 3366.33 samples/sec Loss 5.2596 LearningRate 0.0283 Epoch: 15 Global Step: 78860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:55,729-Speed 3383.27 samples/sec Loss 5.3085 LearningRate 0.0283 Epoch: 15 Global Step: 78870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:04:58,707-Speed 3439.21 samples/sec Loss 5.2782 LearningRate 0.0283 Epoch: 15 Global Step: 78880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:05:01,684-Speed 3441.68 samples/sec Loss 5.2137 LearningRate 0.0283 Epoch: 15 Global Step: 78890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:05:04,668-Speed 3431.55 samples/sec Loss 5.1043 LearningRate 0.0283 Epoch: 15 Global Step: 78900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:05:07,650-Speed 3435.43 samples/sec Loss 5.1535 LearningRate 0.0283 Epoch: 15 Global Step: 78910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:05:10,632-Speed 3435.04 samples/sec Loss 5.2958 LearningRate 0.0283 Epoch: 15 Global Step: 78920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:05:13,613-Speed 3435.34 samples/sec Loss 5.2967 LearningRate 0.0282 Epoch: 15 Global Step: 78930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:05:16,587-Speed 3445.07 samples/sec Loss 5.2180 LearningRate 0.0282 Epoch: 15 Global Step: 78940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:05:19,551-Speed 3454.70 samples/sec Loss 5.1747 LearningRate 0.0282 Epoch: 15 Global Step: 78950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:05:22,541-Speed 3426.78 samples/sec Loss 5.3218 LearningRate 0.0282 Epoch: 15 Global Step: 78960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:05:25,529-Speed 3427.86 samples/sec Loss 5.3071 LearningRate 0.0282 Epoch: 15 Global Step: 78970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:05:28,513-Speed 3432.43 samples/sec Loss 5.3409 LearningRate 0.0282 Epoch: 15 Global Step: 78980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:05:31,500-Speed 3429.30 samples/sec Loss 5.2546 LearningRate 0.0282 Epoch: 15 Global Step: 78990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:05:34,481-Speed 3435.25 samples/sec Loss 5.0555 LearningRate 0.0282 Epoch: 15 Global Step: 79000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:05:37,517-Speed 3374.26 samples/sec Loss 5.3089 LearningRate 0.0282 Epoch: 15 Global Step: 79010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:05:40,513-Speed 3419.29 samples/sec Loss 5.0864 LearningRate 0.0281 Epoch: 15 Global Step: 79020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:05:43,506-Speed 3421.69 samples/sec Loss 5.2362 LearningRate 0.0281 Epoch: 15 Global Step: 79030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:05:46,508-Speed 3411.59 samples/sec Loss 5.1852 LearningRate 0.0281 Epoch: 15 Global Step: 79040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:05:49,497-Speed 3427.10 samples/sec Loss 5.1590 LearningRate 0.0281 Epoch: 15 Global Step: 79050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:05:52,489-Speed 3423.71 samples/sec Loss 5.0160 LearningRate 0.0281 Epoch: 15 Global Step: 79060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:05:55,469-Speed 3436.78 samples/sec Loss 5.2972 LearningRate 0.0281 Epoch: 15 Global Step: 79070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:05:58,509-Speed 3369.98 samples/sec Loss 5.2242 LearningRate 0.0281 Epoch: 15 Global Step: 79080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:06:01,547-Speed 3371.13 samples/sec Loss 5.2373 LearningRate 0.0281 Epoch: 15 Global Step: 79090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:06:04,538-Speed 3424.39 samples/sec Loss 5.1122 LearningRate 0.0280 Epoch: 15 Global Step: 79100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:06:07,556-Speed 3394.39 samples/sec Loss 5.2862 LearningRate 0.0280 Epoch: 15 Global Step: 79110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:06:10,588-Speed 3377.75 samples/sec Loss 5.2890 LearningRate 0.0280 Epoch: 15 Global Step: 79120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:06:13,588-Speed 3414.42 samples/sec Loss 5.4548 LearningRate 0.0280 Epoch: 15 Global Step: 79130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:06:16,591-Speed 3410.70 samples/sec Loss 5.3377 LearningRate 0.0280 Epoch: 15 Global Step: 79140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:06:19,593-Speed 3412.67 samples/sec Loss 5.1383 LearningRate 0.0280 Epoch: 15 Global Step: 79150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:06:22,651-Speed 3349.03 samples/sec Loss 5.1766 LearningRate 0.0280 Epoch: 15 Global Step: 79160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:06:25,675-Speed 3387.04 samples/sec Loss 5.3137 LearningRate 0.0280 Epoch: 15 Global Step: 79170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:06:28,664-Speed 3427.57 samples/sec Loss 5.2480 LearningRate 0.0279 Epoch: 15 Global Step: 79180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:06:31,654-Speed 3424.75 samples/sec Loss 5.1219 LearningRate 0.0279 Epoch: 15 Global Step: 79190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:06:34,647-Speed 3422.89 samples/sec Loss 5.1236 LearningRate 0.0279 Epoch: 15 Global Step: 79200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:06:37,645-Speed 3416.13 samples/sec Loss 5.4099 LearningRate 0.0279 Epoch: 15 Global Step: 79210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:06:40,634-Speed 3427.18 samples/sec Loss 5.1749 LearningRate 0.0279 Epoch: 15 Global Step: 79220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:06:43,641-Speed 3406.31 samples/sec Loss 5.1599 LearningRate 0.0279 Epoch: 15 Global Step: 79230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:06:46,632-Speed 3425.20 samples/sec Loss 5.1692 LearningRate 0.0279 Epoch: 15 Global Step: 79240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:06:49,617-Speed 3430.76 samples/sec Loss 5.2156 LearningRate 0.0279 Epoch: 15 Global Step: 79250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:06:52,605-Speed 3428.94 samples/sec Loss 5.0057 LearningRate 0.0279 Epoch: 15 Global Step: 79260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:06:55,598-Speed 3421.88 samples/sec Loss 5.1901 LearningRate 0.0278 Epoch: 15 Global Step: 79270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:06:58,562-Speed 3454.88 samples/sec Loss 5.1355 LearningRate 0.0278 Epoch: 15 Global Step: 79280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:01,544-Speed 3435.02 samples/sec Loss 5.1364 LearningRate 0.0278 Epoch: 15 Global Step: 79290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:04,547-Speed 3410.71 samples/sec Loss 5.2159 LearningRate 0.0278 Epoch: 15 Global Step: 79300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:07,542-Speed 3420.32 samples/sec Loss 5.2666 LearningRate 0.0278 Epoch: 15 Global Step: 79310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:10,554-Speed 3400.95 samples/sec Loss 5.3796 LearningRate 0.0278 Epoch: 15 Global Step: 79320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:13,537-Speed 3432.76 samples/sec Loss 5.2103 LearningRate 0.0278 Epoch: 15 Global Step: 79330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:16,537-Speed 3414.77 samples/sec Loss 5.3201 LearningRate 0.0278 Epoch: 15 Global Step: 79340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:19,527-Speed 3425.89 samples/sec Loss 5.2865 LearningRate 0.0277 Epoch: 15 Global Step: 79350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:22,507-Speed 3437.73 samples/sec Loss 5.0798 LearningRate 0.0277 Epoch: 15 Global Step: 79360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:25,541-Speed 3375.85 samples/sec Loss 5.3501 LearningRate 0.0277 Epoch: 15 Global Step: 79370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:28,528-Speed 3429.02 samples/sec Loss 5.2177 LearningRate 0.0277 Epoch: 15 Global Step: 79380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:07:31,519-Speed 3424.07 samples/sec Loss 5.1361 LearningRate 0.0277 Epoch: 15 Global Step: 79390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:07:34,501-Speed 3435.33 samples/sec Loss 5.1223 LearningRate 0.0277 Epoch: 15 Global Step: 79400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:07:37,493-Speed 3423.66 samples/sec Loss 5.1507 LearningRate 0.0277 Epoch: 15 Global Step: 79410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:07:40,474-Speed 3435.30 samples/sec Loss 5.0978 LearningRate 0.0277 Epoch: 15 Global Step: 79420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:43,484-Speed 3403.64 samples/sec Loss 5.2893 LearningRate 0.0277 Epoch: 15 Global Step: 79430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:46,473-Speed 3426.16 samples/sec Loss 5.2493 LearningRate 0.0276 Epoch: 15 Global Step: 79440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:49,456-Speed 3433.41 samples/sec Loss 5.1973 LearningRate 0.0276 Epoch: 15 Global Step: 79450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:52,453-Speed 3417.90 samples/sec Loss 5.0957 LearningRate 0.0276 Epoch: 15 Global Step: 79460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:55,435-Speed 3434.87 samples/sec Loss 5.2153 LearningRate 0.0276 Epoch: 15 Global Step: 79470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:07:58,484-Speed 3359.66 samples/sec Loss 5.1028 LearningRate 0.0276 Epoch: 15 Global Step: 79480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:01,484-Speed 3415.10 samples/sec Loss 5.2160 LearningRate 0.0276 Epoch: 15 Global Step: 79490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:04,466-Speed 3435.49 samples/sec Loss 5.2056 LearningRate 0.0276 Epoch: 15 Global Step: 79500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:07,480-Speed 3398.47 samples/sec Loss 5.2248 LearningRate 0.0276 Epoch: 15 Global Step: 79510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:10,464-Speed 3432.34 samples/sec Loss 5.1290 LearningRate 0.0275 Epoch: 15 Global Step: 79520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:08:13,480-Speed 3396.22 samples/sec Loss 5.2612 LearningRate 0.0275 Epoch: 15 Global Step: 79530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:08:16,465-Speed 3431.72 samples/sec Loss 5.1913 LearningRate 0.0275 Epoch: 15 Global Step: 79540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:08:19,461-Speed 3418.35 samples/sec Loss 5.2428 LearningRate 0.0275 Epoch: 15 Global Step: 79550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:08:22,457-Speed 3419.47 samples/sec Loss 5.2253 LearningRate 0.0275 Epoch: 15 Global Step: 79560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:08:25,450-Speed 3422.07 samples/sec Loss 5.2299 LearningRate 0.0275 Epoch: 15 Global Step: 79570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:08:28,421-Speed 3446.46 samples/sec Loss 5.1355 LearningRate 0.0275 Epoch: 15 Global Step: 79580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:31,400-Speed 3439.61 samples/sec Loss 5.1297 LearningRate 0.0275 Epoch: 15 Global Step: 79590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:34,379-Speed 3437.87 samples/sec Loss 5.1270 LearningRate 0.0275 Epoch: 15 Global Step: 79600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:37,456-Speed 3329.12 samples/sec Loss 5.1873 LearningRate 0.0274 Epoch: 15 Global Step: 79610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:40,479-Speed 3387.67 samples/sec Loss 5.2060 LearningRate 0.0274 Epoch: 15 Global Step: 79620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:43,462-Speed 3434.47 samples/sec Loss 5.0676 LearningRate 0.0274 Epoch: 15 Global Step: 79630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:46,450-Speed 3427.36 samples/sec Loss 5.1593 LearningRate 0.0274 Epoch: 15 Global Step: 79640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:49,521-Speed 3336.15 samples/sec Loss 5.1803 LearningRate 0.0274 Epoch: 15 Global Step: 79650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:52,519-Speed 3416.31 samples/sec Loss 5.2152 LearningRate 0.0274 Epoch: 15 Global Step: 79660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:55,582-Speed 3344.14 samples/sec Loss 5.2099 LearningRate 0.0274 Epoch: 15 Global Step: 79670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:08:58,624-Speed 3366.85 samples/sec Loss 5.3756 LearningRate 0.0274 Epoch: 15 Global Step: 79680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:09:01,707-Speed 3322.98 samples/sec Loss 5.2069 LearningRate 0.0273 Epoch: 15 Global Step: 79690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:09:04,746-Speed 3370.17 samples/sec Loss 5.2213 LearningRate 0.0273 Epoch: 15 Global Step: 79700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:09:07,811-Speed 3342.05 samples/sec Loss 5.2766 LearningRate 0.0273 Epoch: 15 Global Step: 79710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:09:10,794-Speed 3433.19 samples/sec Loss 5.1250 LearningRate 0.0273 Epoch: 15 Global Step: 79720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:09:13,795-Speed 3413.90 samples/sec Loss 5.1390 LearningRate 0.0273 Epoch: 15 Global Step: 79730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:09:16,792-Speed 3417.67 samples/sec Loss 5.2888 LearningRate 0.0273 Epoch: 15 Global Step: 79740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:09:19,859-Speed 3339.55 samples/sec Loss 5.1852 LearningRate 0.0273 Epoch: 15 Global Step: 79750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:09:22,980-Speed 3281.76 samples/sec Loss 5.3468 LearningRate 0.0273 Epoch: 15 Global Step: 79760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:09:26,151-Speed 3229.53 samples/sec Loss 5.1148 LearningRate 0.0273 Epoch: 15 Global Step: 79770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:09:29,169-Speed 3394.43 samples/sec Loss 5.1929 LearningRate 0.0272 Epoch: 15 Global Step: 79780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:09:32,380-Speed 3190.38 samples/sec Loss 5.1759 LearningRate 0.0272 Epoch: 15 Global Step: 79790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:09:35,523-Speed 3258.74 samples/sec Loss 5.2495 LearningRate 0.0272 Epoch: 15 Global Step: 79800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:09:38,529-Speed 3406.38 samples/sec Loss 5.2126 LearningRate 0.0272 Epoch: 15 Global Step: 79810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:09:41,530-Speed 3413.38 samples/sec Loss 5.2461 LearningRate 0.0272 Epoch: 15 Global Step: 79820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:09:44,512-Speed 3434.78 samples/sec Loss 5.0739 LearningRate 0.0272 Epoch: 15 Global Step: 79830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:09:47,501-Speed 3427.23 samples/sec Loss 5.0705 LearningRate 0.0272 Epoch: 15 Global Step: 79840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:09:50,483-Speed 3434.28 samples/sec Loss 5.2526 LearningRate 0.0272 Epoch: 15 Global Step: 79850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:09:53,473-Speed 3426.11 samples/sec Loss 5.1917 LearningRate 0.0272 Epoch: 15 Global Step: 79860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:09:56,458-Speed 3430.96 samples/sec Loss 4.9496 LearningRate 0.0271 Epoch: 15 Global Step: 79870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:09:59,485-Speed 3384.52 samples/sec Loss 5.1582 LearningRate 0.0271 Epoch: 15 Global Step: 79880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:10:02,457-Speed 3446.52 samples/sec Loss 5.1132 LearningRate 0.0271 Epoch: 15 Global Step: 79890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:10:05,423-Speed 3452.63 samples/sec Loss 5.0535 LearningRate 0.0271 Epoch: 15 Global Step: 79900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:10:08,465-Speed 3367.95 samples/sec Loss 4.9651 LearningRate 0.0271 Epoch: 15 Global Step: 79910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:10:11,460-Speed 3419.11 samples/sec Loss 5.2900 LearningRate 0.0271 Epoch: 15 Global Step: 79920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:10:14,452-Speed 3424.25 samples/sec Loss 5.1606 LearningRate 0.0271 Epoch: 15 Global Step: 79930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:10:17,447-Speed 3419.90 samples/sec Loss 5.2099 LearningRate 0.0271 Epoch: 15 Global Step: 79940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:10:20,481-Speed 3375.09 samples/sec Loss 5.2694 LearningRate 0.0270 Epoch: 15 Global Step: 79950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:10:23,469-Speed 3427.87 samples/sec Loss 5.1620 LearningRate 0.0270 Epoch: 15 Global Step: 79960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:10:26,496-Speed 3384.15 samples/sec Loss 5.0689 LearningRate 0.0270 Epoch: 15 Global Step: 79970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:10:29,513-Speed 3395.82 samples/sec Loss 5.3569 LearningRate 0.0270 Epoch: 15 Global Step: 79980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:10:32,498-Speed 3430.83 samples/sec Loss 5.2387 LearningRate 0.0270 Epoch: 15 Global Step: 79990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:10:35,480-Speed 3434.96 samples/sec Loss 5.2263 LearningRate 0.0270 Epoch: 15 Global Step: 80000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:11:18,725-[lfw][80000]XNorm: 22.978001 Training: 2022-01-20 01:11:18,726-[lfw][80000]Accuracy-Flip: 0.99783+-0.00224 Training: 2022-01-20 01:11:18,726-[lfw][80000]Accuracy-Highest: 0.99817 Training: 2022-01-20 01:12:08,840-[cfp_fp][80000]XNorm: 20.841562 Training: 2022-01-20 01:12:08,841-[cfp_fp][80000]Accuracy-Flip: 0.97914+-0.00661 Training: 2022-01-20 01:12:08,841-[cfp_fp][80000]Accuracy-Highest: 0.97914 Training: 2022-01-20 01:12:51,895-[agedb_30][80000]XNorm: 22.756596 Training: 2022-01-20 01:12:51,896-[agedb_30][80000]Accuracy-Flip: 0.97983+-0.00769 Training: 2022-01-20 01:12:51,896-[agedb_30][80000]Accuracy-Highest: 0.98100 Training: 2022-01-20 01:12:54,899-Speed 73.45 samples/sec Loss 5.2144 LearningRate 0.0270 Epoch: 15 Global Step: 80010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:12:57,861-Speed 3458.09 samples/sec Loss 5.1949 LearningRate 0.0270 Epoch: 15 Global Step: 80020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:00,833-Speed 3446.38 samples/sec Loss 5.2443 LearningRate 0.0270 Epoch: 15 Global Step: 80030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:03,803-Speed 3449.23 samples/sec Loss 5.1422 LearningRate 0.0269 Epoch: 15 Global Step: 80040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:06,777-Speed 3443.80 samples/sec Loss 5.1390 LearningRate 0.0269 Epoch: 15 Global Step: 80050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:09,765-Speed 3428.32 samples/sec Loss 5.1350 LearningRate 0.0269 Epoch: 15 Global Step: 80060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:12,754-Speed 3426.35 samples/sec Loss 5.0555 LearningRate 0.0269 Epoch: 15 Global Step: 80070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:15,752-Speed 3416.94 samples/sec Loss 4.9967 LearningRate 0.0269 Epoch: 15 Global Step: 80080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:18,724-Speed 3445.96 samples/sec Loss 5.1464 LearningRate 0.0269 Epoch: 15 Global Step: 80090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:21,698-Speed 3444.58 samples/sec Loss 5.1375 LearningRate 0.0269 Epoch: 15 Global Step: 80100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:13:24,744-Speed 3362.82 samples/sec Loss 5.1794 LearningRate 0.0269 Epoch: 15 Global Step: 80110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:13:27,771-Speed 3384.14 samples/sec Loss 5.2157 LearningRate 0.0268 Epoch: 15 Global Step: 80120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:30,755-Speed 3431.78 samples/sec Loss 5.2595 LearningRate 0.0268 Epoch: 15 Global Step: 80130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:33,737-Speed 3435.39 samples/sec Loss 5.1727 LearningRate 0.0268 Epoch: 15 Global Step: 80140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:36,718-Speed 3436.04 samples/sec Loss 4.9929 LearningRate 0.0268 Epoch: 15 Global Step: 80150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:39,714-Speed 3419.73 samples/sec Loss 5.0536 LearningRate 0.0268 Epoch: 15 Global Step: 80160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:42,698-Speed 3432.74 samples/sec Loss 5.1795 LearningRate 0.0268 Epoch: 15 Global Step: 80170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:45,684-Speed 3429.36 samples/sec Loss 5.1936 LearningRate 0.0268 Epoch: 15 Global Step: 80180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:48,664-Speed 3437.98 samples/sec Loss 5.0882 LearningRate 0.0268 Epoch: 15 Global Step: 80190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:51,649-Speed 3430.27 samples/sec Loss 5.1997 LearningRate 0.0268 Epoch: 15 Global Step: 80200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:54,645-Speed 3420.35 samples/sec Loss 5.1330 LearningRate 0.0267 Epoch: 15 Global Step: 80210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:13:57,618-Speed 3445.31 samples/sec Loss 5.0872 LearningRate 0.0267 Epoch: 15 Global Step: 80220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:14:00,596-Speed 3438.85 samples/sec Loss 5.2256 LearningRate 0.0267 Epoch: 15 Global Step: 80230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:14:03,580-Speed 3433.06 samples/sec Loss 5.1814 LearningRate 0.0267 Epoch: 15 Global Step: 80240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:14:06,590-Speed 3402.86 samples/sec Loss 5.1131 LearningRate 0.0267 Epoch: 15 Global Step: 80250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:14:09,729-Speed 3263.53 samples/sec Loss 5.1514 LearningRate 0.0267 Epoch: 15 Global Step: 80260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:14:12,717-Speed 3427.85 samples/sec Loss 5.1832 LearningRate 0.0267 Epoch: 15 Global Step: 80270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:14:15,717-Speed 3414.01 samples/sec Loss 5.1790 LearningRate 0.0267 Epoch: 15 Global Step: 80280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:14:18,708-Speed 3424.38 samples/sec Loss 5.1387 LearningRate 0.0267 Epoch: 15 Global Step: 80290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:14:21,689-Speed 3436.64 samples/sec Loss 5.1505 LearningRate 0.0266 Epoch: 15 Global Step: 80300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:14:24,673-Speed 3432.99 samples/sec Loss 5.1263 LearningRate 0.0266 Epoch: 15 Global Step: 80310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:14:27,652-Speed 3438.29 samples/sec Loss 5.2159 LearningRate 0.0266 Epoch: 15 Global Step: 80320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:14:30,640-Speed 3427.64 samples/sec Loss 5.1657 LearningRate 0.0266 Epoch: 15 Global Step: 80330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:14:33,654-Speed 3398.13 samples/sec Loss 5.0814 LearningRate 0.0266 Epoch: 15 Global Step: 80340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:14:36,630-Speed 3442.27 samples/sec Loss 5.1694 LearningRate 0.0266 Epoch: 15 Global Step: 80350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:14:39,607-Speed 3442.36 samples/sec Loss 5.1257 LearningRate 0.0266 Epoch: 15 Global Step: 80360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:14:42,580-Speed 3445.24 samples/sec Loss 5.0574 LearningRate 0.0266 Epoch: 15 Global Step: 80370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:14:45,573-Speed 3421.89 samples/sec Loss 5.0424 LearningRate 0.0265 Epoch: 15 Global Step: 80380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:14:48,559-Speed 3430.09 samples/sec Loss 5.0853 LearningRate 0.0265 Epoch: 15 Global Step: 80390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:14:51,541-Speed 3435.54 samples/sec Loss 5.1874 LearningRate 0.0265 Epoch: 15 Global Step: 80400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:14:54,517-Speed 3441.91 samples/sec Loss 5.1636 LearningRate 0.0265 Epoch: 15 Global Step: 80410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:14:57,494-Speed 3440.50 samples/sec Loss 5.1858 LearningRate 0.0265 Epoch: 15 Global Step: 80420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:00,469-Speed 3442.69 samples/sec Loss 5.1504 LearningRate 0.0265 Epoch: 15 Global Step: 80430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:03,464-Speed 3419.93 samples/sec Loss 5.1305 LearningRate 0.0265 Epoch: 15 Global Step: 80440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:06,475-Speed 3401.38 samples/sec Loss 4.9402 LearningRate 0.0265 Epoch: 15 Global Step: 80450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:09,451-Speed 3442.59 samples/sec Loss 5.1563 LearningRate 0.0265 Epoch: 15 Global Step: 80460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:12,435-Speed 3431.38 samples/sec Loss 5.1476 LearningRate 0.0264 Epoch: 15 Global Step: 80470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:15,424-Speed 3427.70 samples/sec Loss 5.1573 LearningRate 0.0264 Epoch: 15 Global Step: 80480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:15:18,402-Speed 3439.26 samples/sec Loss 5.1241 LearningRate 0.0264 Epoch: 15 Global Step: 80490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:15:21,388-Speed 3430.29 samples/sec Loss 5.0589 LearningRate 0.0264 Epoch: 15 Global Step: 80500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:24,362-Speed 3444.81 samples/sec Loss 5.1721 LearningRate 0.0264 Epoch: 15 Global Step: 80510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:27,342-Speed 3436.64 samples/sec Loss 5.3609 LearningRate 0.0264 Epoch: 15 Global Step: 80520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:30,319-Speed 3441.12 samples/sec Loss 5.1170 LearningRate 0.0264 Epoch: 15 Global Step: 80530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:33,298-Speed 3438.38 samples/sec Loss 5.1770 LearningRate 0.0264 Epoch: 15 Global Step: 80540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:36,282-Speed 3431.83 samples/sec Loss 5.2158 LearningRate 0.0264 Epoch: 15 Global Step: 80550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:39,272-Speed 3425.65 samples/sec Loss 5.1788 LearningRate 0.0263 Epoch: 15 Global Step: 80560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:42,248-Speed 3441.83 samples/sec Loss 5.1804 LearningRate 0.0263 Epoch: 15 Global Step: 80570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:45,224-Speed 3441.90 samples/sec Loss 5.0380 LearningRate 0.0263 Epoch: 15 Global Step: 80580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:48,198-Speed 3444.52 samples/sec Loss 5.1027 LearningRate 0.0263 Epoch: 15 Global Step: 80590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:15:51,173-Speed 3443.43 samples/sec Loss 5.1339 LearningRate 0.0263 Epoch: 15 Global Step: 80600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:15:54,155-Speed 3434.85 samples/sec Loss 5.0595 LearningRate 0.0263 Epoch: 15 Global Step: 80610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:15:57,164-Speed 3403.16 samples/sec Loss 5.1003 LearningRate 0.0263 Epoch: 15 Global Step: 80620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:16:00,168-Speed 3410.41 samples/sec Loss 5.0522 LearningRate 0.0263 Epoch: 15 Global Step: 80630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:16:03,161-Speed 3422.15 samples/sec Loss 5.0695 LearningRate 0.0262 Epoch: 15 Global Step: 80640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:16:06,139-Speed 3438.63 samples/sec Loss 5.1005 LearningRate 0.0262 Epoch: 15 Global Step: 80650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:16:09,196-Speed 3350.84 samples/sec Loss 5.1005 LearningRate 0.0262 Epoch: 15 Global Step: 80660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:16:12,201-Speed 3408.64 samples/sec Loss 5.1265 LearningRate 0.0262 Epoch: 15 Global Step: 80670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:16:15,213-Speed 3401.71 samples/sec Loss 5.1293 LearningRate 0.0262 Epoch: 15 Global Step: 80680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:16:18,223-Speed 3402.31 samples/sec Loss 5.0790 LearningRate 0.0262 Epoch: 15 Global Step: 80690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:16:21,255-Speed 3378.28 samples/sec Loss 5.0693 LearningRate 0.0262 Epoch: 15 Global Step: 80700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:16:24,241-Speed 3430.38 samples/sec Loss 5.0821 LearningRate 0.0262 Epoch: 15 Global Step: 80710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:16:27,228-Speed 3428.63 samples/sec Loss 5.1810 LearningRate 0.0262 Epoch: 15 Global Step: 80720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:16:30,236-Speed 3405.58 samples/sec Loss 5.1249 LearningRate 0.0261 Epoch: 15 Global Step: 80730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:16:33,359-Speed 3279.36 samples/sec Loss 5.1550 LearningRate 0.0261 Epoch: 15 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:16:36,347-Speed 3428.35 samples/sec Loss 5.0898 LearningRate 0.0261 Epoch: 15 Global Step: 80750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:16:39,414-Speed 3340.53 samples/sec Loss 5.1842 LearningRate 0.0261 Epoch: 15 Global Step: 80760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:16:42,473-Speed 3348.08 samples/sec Loss 5.1812 LearningRate 0.0261 Epoch: 15 Global Step: 80770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:16:45,476-Speed 3410.89 samples/sec Loss 5.1475 LearningRate 0.0261 Epoch: 15 Global Step: 80780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:16:48,485-Speed 3403.86 samples/sec Loss 5.0705 LearningRate 0.0261 Epoch: 15 Global Step: 80790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:16:51,469-Speed 3432.14 samples/sec Loss 5.0385 LearningRate 0.0261 Epoch: 15 Global Step: 80800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:16:54,589-Speed 3444.19 samples/sec Loss 5.2543 LearningRate 0.0261 Epoch: 15 Global Step: 80810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:16:57,566-Speed 3440.21 samples/sec Loss 5.0183 LearningRate 0.0260 Epoch: 15 Global Step: 80820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:01,296-Speed 3377.40 samples/sec Loss 5.0310 LearningRate 0.0260 Epoch: 15 Global Step: 80830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:04,336-Speed 3432.30 samples/sec Loss 5.0850 LearningRate 0.0260 Epoch: 15 Global Step: 80840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:07,339-Speed 3411.82 samples/sec Loss 5.0851 LearningRate 0.0260 Epoch: 15 Global Step: 80850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:10,516-Speed 3297.57 samples/sec Loss 5.1118 LearningRate 0.0260 Epoch: 15 Global Step: 80860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:13,588-Speed 3334.13 samples/sec Loss 4.9624 LearningRate 0.0260 Epoch: 15 Global Step: 80870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:16,594-Speed 3406.60 samples/sec Loss 5.1748 LearningRate 0.0260 Epoch: 15 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:19,573-Speed 3439.31 samples/sec Loss 5.0180 LearningRate 0.0260 Epoch: 15 Global Step: 80890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:22,556-Speed 3433.50 samples/sec Loss 5.1076 LearningRate 0.0260 Epoch: 15 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:25,534-Speed 3440.24 samples/sec Loss 5.0803 LearningRate 0.0259 Epoch: 15 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:28,606-Speed 3333.51 samples/sec Loss 5.1084 LearningRate 0.0259 Epoch: 15 Global Step: 80920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:41,184-Speed 814.22 samples/sec Loss 4.9098 LearningRate 0.0259 Epoch: 16 Global Step: 80930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:44,198-Speed 3398.45 samples/sec Loss 4.2197 LearningRate 0.0259 Epoch: 16 Global Step: 80940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:47,203-Speed 3409.53 samples/sec Loss 4.3703 LearningRate 0.0259 Epoch: 16 Global Step: 80950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:50,277-Speed 3331.85 samples/sec Loss 4.1523 LearningRate 0.0259 Epoch: 16 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:53,262-Speed 3430.99 samples/sec Loss 4.2604 LearningRate 0.0259 Epoch: 16 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:56,254-Speed 3422.83 samples/sec Loss 4.3328 LearningRate 0.0259 Epoch: 16 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:17:59,328-Speed 3332.20 samples/sec Loss 4.3594 LearningRate 0.0258 Epoch: 16 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:02,316-Speed 3428.61 samples/sec Loss 4.2393 LearningRate 0.0258 Epoch: 16 Global Step: 81000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:05,370-Speed 3354.31 samples/sec Loss 4.3561 LearningRate 0.0258 Epoch: 16 Global Step: 81010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:18:08,405-Speed 3374.82 samples/sec Loss 4.3146 LearningRate 0.0258 Epoch: 16 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:11,449-Speed 3364.36 samples/sec Loss 4.2550 LearningRate 0.0258 Epoch: 16 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:14,456-Speed 3406.44 samples/sec Loss 4.2631 LearningRate 0.0258 Epoch: 16 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:17,453-Speed 3417.78 samples/sec Loss 4.2757 LearningRate 0.0258 Epoch: 16 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:20,440-Speed 3430.18 samples/sec Loss 4.3208 LearningRate 0.0258 Epoch: 16 Global Step: 81060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:23,450-Speed 3403.13 samples/sec Loss 4.3780 LearningRate 0.0258 Epoch: 16 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:26,438-Speed 3428.37 samples/sec Loss 4.3787 LearningRate 0.0257 Epoch: 16 Global Step: 81080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:29,594-Speed 3244.86 samples/sec Loss 4.2190 LearningRate 0.0257 Epoch: 16 Global Step: 81090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:32,694-Speed 3303.31 samples/sec Loss 4.3678 LearningRate 0.0257 Epoch: 16 Global Step: 81100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:35,740-Speed 3363.37 samples/sec Loss 4.4839 LearningRate 0.0257 Epoch: 16 Global Step: 81110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:38,810-Speed 3336.05 samples/sec Loss 4.3799 LearningRate 0.0257 Epoch: 16 Global Step: 81120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:18:41,838-Speed 3382.44 samples/sec Loss 4.3065 LearningRate 0.0257 Epoch: 16 Global Step: 81130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:18:44,903-Speed 3342.46 samples/sec Loss 4.3190 LearningRate 0.0257 Epoch: 16 Global Step: 81140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:18:47,895-Speed 3424.01 samples/sec Loss 4.3004 LearningRate 0.0257 Epoch: 16 Global Step: 81150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:18:50,932-Speed 3373.00 samples/sec Loss 4.2489 LearningRate 0.0257 Epoch: 16 Global Step: 81160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:18:53,911-Speed 3438.40 samples/sec Loss 4.3370 LearningRate 0.0256 Epoch: 16 Global Step: 81170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:57,006-Speed 3308.67 samples/sec Loss 4.4692 LearningRate 0.0256 Epoch: 16 Global Step: 81180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:18:59,999-Speed 3422.64 samples/sec Loss 4.3928 LearningRate 0.0256 Epoch: 16 Global Step: 81190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:03,023-Speed 3386.70 samples/sec Loss 4.4576 LearningRate 0.0256 Epoch: 16 Global Step: 81200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:06,019-Speed 3419.96 samples/sec Loss 4.4690 LearningRate 0.0256 Epoch: 16 Global Step: 81210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:09,006-Speed 3428.75 samples/sec Loss 4.4260 LearningRate 0.0256 Epoch: 16 Global Step: 81220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:11,992-Speed 3430.65 samples/sec Loss 4.3957 LearningRate 0.0256 Epoch: 16 Global Step: 81230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:14,971-Speed 3437.78 samples/sec Loss 4.3248 LearningRate 0.0256 Epoch: 16 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:17,973-Speed 3412.75 samples/sec Loss 4.5110 LearningRate 0.0256 Epoch: 16 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:20,998-Speed 3386.12 samples/sec Loss 4.4492 LearningRate 0.0255 Epoch: 16 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:24,036-Speed 3370.81 samples/sec Loss 4.4248 LearningRate 0.0255 Epoch: 16 Global Step: 81270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:19:27,009-Speed 3445.11 samples/sec Loss 4.4052 LearningRate 0.0255 Epoch: 16 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:30,003-Speed 3421.42 samples/sec Loss 4.5690 LearningRate 0.0255 Epoch: 16 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:33,013-Speed 3402.56 samples/sec Loss 4.4408 LearningRate 0.0255 Epoch: 16 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:35,999-Speed 3431.00 samples/sec Loss 4.4336 LearningRate 0.0255 Epoch: 16 Global Step: 81310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:39,028-Speed 3381.99 samples/sec Loss 4.6122 LearningRate 0.0255 Epoch: 16 Global Step: 81320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:42,016-Speed 3427.04 samples/sec Loss 4.4961 LearningRate 0.0255 Epoch: 16 Global Step: 81330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:44,999-Speed 3434.62 samples/sec Loss 4.4907 LearningRate 0.0255 Epoch: 16 Global Step: 81340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:47,999-Speed 3413.20 samples/sec Loss 4.5774 LearningRate 0.0254 Epoch: 16 Global Step: 81350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:50,980-Speed 3436.51 samples/sec Loss 4.6643 LearningRate 0.0254 Epoch: 16 Global Step: 81360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:53,961-Speed 3436.08 samples/sec Loss 4.6093 LearningRate 0.0254 Epoch: 16 Global Step: 81370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:19:56,942-Speed 3436.19 samples/sec Loss 4.4741 LearningRate 0.0254 Epoch: 16 Global Step: 81380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:19:59,920-Speed 3439.90 samples/sec Loss 4.4552 LearningRate 0.0254 Epoch: 16 Global Step: 81390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:20:02,947-Speed 3383.72 samples/sec Loss 4.4332 LearningRate 0.0254 Epoch: 16 Global Step: 81400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:20:05,928-Speed 3435.75 samples/sec Loss 4.4218 LearningRate 0.0254 Epoch: 16 Global Step: 81410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:20:08,953-Speed 3385.98 samples/sec Loss 4.5521 LearningRate 0.0254 Epoch: 16 Global Step: 81420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:20:12,112-Speed 3242.38 samples/sec Loss 4.4737 LearningRate 0.0254 Epoch: 16 Global Step: 81430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:20:15,125-Speed 3400.42 samples/sec Loss 4.4833 LearningRate 0.0253 Epoch: 16 Global Step: 81440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:20:18,117-Speed 3423.39 samples/sec Loss 4.5945 LearningRate 0.0253 Epoch: 16 Global Step: 81450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:20:21,078-Speed 3459.80 samples/sec Loss 4.5134 LearningRate 0.0253 Epoch: 16 Global Step: 81460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:20:24,070-Speed 3422.77 samples/sec Loss 4.6070 LearningRate 0.0253 Epoch: 16 Global Step: 81470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:20:27,063-Speed 3422.34 samples/sec Loss 4.3886 LearningRate 0.0253 Epoch: 16 Global Step: 81480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:20:30,050-Speed 3429.77 samples/sec Loss 4.6151 LearningRate 0.0253 Epoch: 16 Global Step: 81490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:20:33,035-Speed 3431.44 samples/sec Loss 4.6616 LearningRate 0.0253 Epoch: 16 Global Step: 81500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:20:36,023-Speed 3427.40 samples/sec Loss 4.5404 LearningRate 0.0253 Epoch: 16 Global Step: 81510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:20:39,050-Speed 3383.67 samples/sec Loss 4.5053 LearningRate 0.0253 Epoch: 16 Global Step: 81520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:20:42,137-Speed 3317.94 samples/sec Loss 4.4953 LearningRate 0.0252 Epoch: 16 Global Step: 81530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:20:45,127-Speed 3425.74 samples/sec Loss 4.5248 LearningRate 0.0252 Epoch: 16 Global Step: 81540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:20:48,167-Speed 3369.95 samples/sec Loss 4.6315 LearningRate 0.0252 Epoch: 16 Global Step: 81550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:20:51,157-Speed 3425.94 samples/sec Loss 4.6413 LearningRate 0.0252 Epoch: 16 Global Step: 81560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:20:54,140-Speed 3432.45 samples/sec Loss 4.6202 LearningRate 0.0252 Epoch: 16 Global Step: 81570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:20:57,134-Speed 3422.00 samples/sec Loss 4.5951 LearningRate 0.0252 Epoch: 16 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:00,124-Speed 3427.09 samples/sec Loss 4.5879 LearningRate 0.0252 Epoch: 16 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:03,184-Speed 3346.36 samples/sec Loss 4.5850 LearningRate 0.0252 Epoch: 16 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:06,232-Speed 3361.04 samples/sec Loss 4.5906 LearningRate 0.0251 Epoch: 16 Global Step: 81610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:09,232-Speed 3414.60 samples/sec Loss 4.5308 LearningRate 0.0251 Epoch: 16 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:12,213-Speed 3435.04 samples/sec Loss 4.5520 LearningRate 0.0251 Epoch: 16 Global Step: 81630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:15,216-Speed 3412.18 samples/sec Loss 4.7017 LearningRate 0.0251 Epoch: 16 Global Step: 81640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:18,228-Speed 3400.69 samples/sec Loss 4.6313 LearningRate 0.0251 Epoch: 16 Global Step: 81650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:21,243-Speed 3397.26 samples/sec Loss 4.6179 LearningRate 0.0251 Epoch: 16 Global Step: 81660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:21:24,268-Speed 3386.69 samples/sec Loss 4.5820 LearningRate 0.0251 Epoch: 16 Global Step: 81670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:21:27,254-Speed 3429.87 samples/sec Loss 4.6638 LearningRate 0.0251 Epoch: 16 Global Step: 81680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:21:30,299-Speed 3364.58 samples/sec Loss 4.7406 LearningRate 0.0251 Epoch: 16 Global Step: 81690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:21:33,283-Speed 3432.36 samples/sec Loss 4.6894 LearningRate 0.0250 Epoch: 16 Global Step: 81700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:21:36,273-Speed 3426.18 samples/sec Loss 4.4762 LearningRate 0.0250 Epoch: 16 Global Step: 81710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:21:39,240-Speed 3451.19 samples/sec Loss 4.7801 LearningRate 0.0250 Epoch: 16 Global Step: 81720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:42,221-Speed 3435.84 samples/sec Loss 4.5765 LearningRate 0.0250 Epoch: 16 Global Step: 81730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:45,204-Speed 3434.76 samples/sec Loss 4.7502 LearningRate 0.0250 Epoch: 16 Global Step: 81740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:48,199-Speed 3418.93 samples/sec Loss 4.6544 LearningRate 0.0250 Epoch: 16 Global Step: 81750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:51,209-Speed 3403.53 samples/sec Loss 4.8074 LearningRate 0.0250 Epoch: 16 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:54,191-Speed 3434.54 samples/sec Loss 4.6978 LearningRate 0.0250 Epoch: 16 Global Step: 81770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:21:57,203-Speed 3401.63 samples/sec Loss 4.6217 LearningRate 0.0250 Epoch: 16 Global Step: 81780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:22:00,239-Speed 3373.94 samples/sec Loss 4.6190 LearningRate 0.0249 Epoch: 16 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:22:03,229-Speed 3424.76 samples/sec Loss 4.6563 LearningRate 0.0249 Epoch: 16 Global Step: 81800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:22:06,220-Speed 3424.58 samples/sec Loss 4.5422 LearningRate 0.0249 Epoch: 16 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:22:09,217-Speed 3418.54 samples/sec Loss 4.6967 LearningRate 0.0249 Epoch: 16 Global Step: 81820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:22:12,211-Speed 3419.92 samples/sec Loss 4.6996 LearningRate 0.0249 Epoch: 16 Global Step: 81830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:22:15,201-Speed 3426.08 samples/sec Loss 4.7264 LearningRate 0.0249 Epoch: 16 Global Step: 81840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:22:18,226-Speed 3386.52 samples/sec Loss 4.6625 LearningRate 0.0249 Epoch: 16 Global Step: 81850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:22:21,236-Speed 3402.91 samples/sec Loss 4.6679 LearningRate 0.0249 Epoch: 16 Global Step: 81860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:22:24,243-Speed 3405.95 samples/sec Loss 4.6296 LearningRate 0.0249 Epoch: 16 Global Step: 81870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:22:27,227-Speed 3433.27 samples/sec Loss 4.7196 LearningRate 0.0248 Epoch: 16 Global Step: 81880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:22:30,219-Speed 3423.46 samples/sec Loss 4.7112 LearningRate 0.0248 Epoch: 16 Global Step: 81890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:22:33,299-Speed 3325.14 samples/sec Loss 4.7204 LearningRate 0.0248 Epoch: 16 Global Step: 81900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:22:36,329-Speed 3380.18 samples/sec Loss 4.6792 LearningRate 0.0248 Epoch: 16 Global Step: 81910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:22:39,464-Speed 3267.44 samples/sec Loss 4.6801 LearningRate 0.0248 Epoch: 16 Global Step: 81920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:22:42,556-Speed 3311.99 samples/sec Loss 4.7126 LearningRate 0.0248 Epoch: 16 Global Step: 81930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:22:45,640-Speed 3321.55 samples/sec Loss 4.6546 LearningRate 0.0248 Epoch: 16 Global Step: 81940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:22:48,720-Speed 3325.57 samples/sec Loss 4.6782 LearningRate 0.0248 Epoch: 16 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:22:51,796-Speed 3331.23 samples/sec Loss 4.7044 LearningRate 0.0248 Epoch: 16 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:22:54,903-Speed 3295.67 samples/sec Loss 4.7776 LearningRate 0.0247 Epoch: 16 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:22:57,974-Speed 3334.92 samples/sec Loss 4.7150 LearningRate 0.0247 Epoch: 16 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:23:00,996-Speed 3390.56 samples/sec Loss 4.7101 LearningRate 0.0247 Epoch: 16 Global Step: 81990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:23:04,031-Speed 3374.16 samples/sec Loss 4.7381 LearningRate 0.0247 Epoch: 16 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:23:47,153-[lfw][82000]XNorm: 23.168299 Training: 2022-01-20 01:23:47,154-[lfw][82000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-01-20 01:23:47,154-[lfw][82000]Accuracy-Highest: 0.99817 Training: 2022-01-20 01:24:37,183-[cfp_fp][82000]XNorm: 20.764814 Training: 2022-01-20 01:24:37,183-[cfp_fp][82000]Accuracy-Flip: 0.97814+-0.00599 Training: 2022-01-20 01:24:37,184-[cfp_fp][82000]Accuracy-Highest: 0.97914 Training: 2022-01-20 01:25:20,062-[agedb_30][82000]XNorm: 22.757122 Training: 2022-01-20 01:25:20,063-[agedb_30][82000]Accuracy-Flip: 0.97883+-0.00975 Training: 2022-01-20 01:25:20,064-[agedb_30][82000]Accuracy-Highest: 0.98100 Training: 2022-01-20 01:25:23,099-Speed 73.63 samples/sec Loss 4.6902 LearningRate 0.0247 Epoch: 16 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:25:26,065-Speed 3452.76 samples/sec Loss 4.6624 LearningRate 0.0247 Epoch: 16 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:25:29,126-Speed 3347.05 samples/sec Loss 4.7417 LearningRate 0.0247 Epoch: 16 Global Step: 82030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:25:32,098-Speed 3446.91 samples/sec Loss 4.7943 LearningRate 0.0247 Epoch: 16 Global Step: 82040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:25:35,075-Speed 3440.86 samples/sec Loss 4.5186 LearningRate 0.0247 Epoch: 16 Global Step: 82050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-20 01:25:38,032-Speed 3464.43 samples/sec Loss 4.6312 LearningRate 0.0246 Epoch: 16 Global Step: 82060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:25:41,010-Speed 3438.42 samples/sec Loss 4.6911 LearningRate 0.0246 Epoch: 16 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:25:43,996-Speed 3430.38 samples/sec Loss 4.9464 LearningRate 0.0246 Epoch: 16 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:25:46,990-Speed 3421.43 samples/sec Loss 4.7314 LearningRate 0.0246 Epoch: 16 Global Step: 82090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:25:49,985-Speed 3420.50 samples/sec Loss 4.7378 LearningRate 0.0246 Epoch: 16 Global Step: 82100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:25:52,979-Speed 3421.60 samples/sec Loss 4.7059 LearningRate 0.0246 Epoch: 16 Global Step: 82110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:25:56,125-Speed 3254.97 samples/sec Loss 4.5655 LearningRate 0.0246 Epoch: 16 Global Step: 82120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:25:59,140-Speed 3397.54 samples/sec Loss 4.7491 LearningRate 0.0246 Epoch: 16 Global Step: 82130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:26:02,137-Speed 3417.83 samples/sec Loss 4.6629 LearningRate 0.0246 Epoch: 16 Global Step: 82140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:26:05,123-Speed 3430.12 samples/sec Loss 4.7433 LearningRate 0.0245 Epoch: 16 Global Step: 82150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:26:08,087-Speed 3456.83 samples/sec Loss 4.7585 LearningRate 0.0245 Epoch: 16 Global Step: 82160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:26:11,063-Speed 3441.14 samples/sec Loss 4.7036 LearningRate 0.0245 Epoch: 16 Global Step: 82170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:26:14,035-Speed 3445.87 samples/sec Loss 4.7711 LearningRate 0.0245 Epoch: 16 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:26:17,017-Speed 3436.23 samples/sec Loss 4.7759 LearningRate 0.0245 Epoch: 16 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:26:20,008-Speed 3424.99 samples/sec Loss 4.7728 LearningRate 0.0245 Epoch: 16 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:26:23,093-Speed 3320.17 samples/sec Loss 4.5999 LearningRate 0.0245 Epoch: 16 Global Step: 82210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:26:26,098-Speed 3407.49 samples/sec Loss 4.7781 LearningRate 0.0245 Epoch: 16 Global Step: 82220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:26:29,089-Speed 3424.73 samples/sec Loss 4.6514 LearningRate 0.0245 Epoch: 16 Global Step: 82230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:26:32,067-Speed 3440.46 samples/sec Loss 4.7912 LearningRate 0.0244 Epoch: 16 Global Step: 82240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:26:35,073-Speed 3407.12 samples/sec Loss 4.7394 LearningRate 0.0244 Epoch: 16 Global Step: 82250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:26:38,055-Speed 3434.80 samples/sec Loss 4.8956 LearningRate 0.0244 Epoch: 16 Global Step: 82260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:26:41,038-Speed 3433.52 samples/sec Loss 4.8680 LearningRate 0.0244 Epoch: 16 Global Step: 82270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:26:44,016-Speed 3439.87 samples/sec Loss 4.7027 LearningRate 0.0244 Epoch: 16 Global Step: 82280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:26:47,018-Speed 3411.81 samples/sec Loss 4.7130 LearningRate 0.0244 Epoch: 16 Global Step: 82290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:26:50,062-Speed 3365.63 samples/sec Loss 4.8123 LearningRate 0.0244 Epoch: 16 Global Step: 82300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-20 01:26:53,064-Speed 3412.14 samples/sec Loss 4.7519 LearningRate 0.0244 Epoch: 16 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:26:56,059-Speed 3420.38 samples/sec Loss 4.9043 LearningRate 0.0244 Epoch: 16 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:26:59,038-Speed 3437.77 samples/sec Loss 4.8374 LearningRate 0.0243 Epoch: 16 Global Step: 82330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:27:02,076-Speed 3372.41 samples/sec Loss 4.6663 LearningRate 0.0243 Epoch: 16 Global Step: 82340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:27:05,072-Speed 3418.86 samples/sec Loss 4.6757 LearningRate 0.0243 Epoch: 16 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:27:08,107-Speed 3374.33 samples/sec Loss 4.6659 LearningRate 0.0243 Epoch: 16 Global Step: 82360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:27:11,081-Speed 3445.08 samples/sec Loss 4.6405 LearningRate 0.0243 Epoch: 16 Global Step: 82370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-20 01:27:14,057-Speed 3441.30 samples/sec Loss 4.7873 LearningRate 0.0243 Epoch: 16 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:17,042-Speed 3431.37 samples/sec Loss 4.7991 LearningRate 0.0243 Epoch: 16 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:20,023-Speed 3436.42 samples/sec Loss 4.6475 LearningRate 0.0243 Epoch: 16 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:23,047-Speed 3387.31 samples/sec Loss 4.6853 LearningRate 0.0243 Epoch: 16 Global Step: 82410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:27:26,067-Speed 3391.44 samples/sec Loss 4.7034 LearningRate 0.0242 Epoch: 16 Global Step: 82420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:27:29,031-Speed 3456.32 samples/sec Loss 4.6233 LearningRate 0.0242 Epoch: 16 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:32,004-Speed 3445.00 samples/sec Loss 4.8239 LearningRate 0.0242 Epoch: 16 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:34,975-Speed 3446.95 samples/sec Loss 4.6465 LearningRate 0.0242 Epoch: 16 Global Step: 82450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:37,966-Speed 3425.99 samples/sec Loss 4.6808 LearningRate 0.0242 Epoch: 16 Global Step: 82460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:40,946-Speed 3437.04 samples/sec Loss 4.6528 LearningRate 0.0242 Epoch: 16 Global Step: 82470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:43,958-Speed 3399.97 samples/sec Loss 4.7189 LearningRate 0.0242 Epoch: 16 Global Step: 82480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:47,016-Speed 3349.34 samples/sec Loss 4.7276 LearningRate 0.0242 Epoch: 16 Global Step: 82490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:49,994-Speed 3439.86 samples/sec Loss 4.7427 LearningRate 0.0242 Epoch: 16 Global Step: 82500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:52,977-Speed 3433.63 samples/sec Loss 4.6720 LearningRate 0.0241 Epoch: 16 Global Step: 82510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:55,980-Speed 3411.15 samples/sec Loss 4.8274 LearningRate 0.0241 Epoch: 16 Global Step: 82520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:27:58,937-Speed 3463.15 samples/sec Loss 4.7239 LearningRate 0.0241 Epoch: 16 Global Step: 82530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:01,954-Speed 3396.59 samples/sec Loss 4.8546 LearningRate 0.0241 Epoch: 16 Global Step: 82540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:04,932-Speed 3439.61 samples/sec Loss 4.8051 LearningRate 0.0241 Epoch: 16 Global Step: 82550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:07,943-Speed 3402.26 samples/sec Loss 4.8380 LearningRate 0.0241 Epoch: 16 Global Step: 82560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:10,919-Speed 3441.37 samples/sec Loss 4.7887 LearningRate 0.0241 Epoch: 16 Global Step: 82570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:13,895-Speed 3442.26 samples/sec Loss 4.6839 LearningRate 0.0241 Epoch: 16 Global Step: 82580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:16,872-Speed 3441.58 samples/sec Loss 4.8020 LearningRate 0.0241 Epoch: 16 Global Step: 82590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:19,851-Speed 3438.26 samples/sec Loss 4.7239 LearningRate 0.0241 Epoch: 16 Global Step: 82600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:22,831-Speed 3437.76 samples/sec Loss 4.8727 LearningRate 0.0240 Epoch: 16 Global Step: 82610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:25,906-Speed 3330.03 samples/sec Loss 4.6257 LearningRate 0.0240 Epoch: 16 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:28,885-Speed 3438.59 samples/sec Loss 4.7763 LearningRate 0.0240 Epoch: 16 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:31,861-Speed 3441.63 samples/sec Loss 4.6254 LearningRate 0.0240 Epoch: 16 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:34,834-Speed 3445.65 samples/sec Loss 4.8186 LearningRate 0.0240 Epoch: 16 Global Step: 82650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:37,810-Speed 3441.68 samples/sec Loss 4.7426 LearningRate 0.0240 Epoch: 16 Global Step: 82660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:40,821-Speed 3402.21 samples/sec Loss 4.7987 LearningRate 0.0240 Epoch: 16 Global Step: 82670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:43,799-Speed 3439.36 samples/sec Loss 4.5830 LearningRate 0.0240 Epoch: 16 Global Step: 82680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:46,779-Speed 3436.95 samples/sec Loss 4.7495 LearningRate 0.0240 Epoch: 16 Global Step: 82690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:49,884-Speed 3299.06 samples/sec Loss 4.7298 LearningRate 0.0239 Epoch: 16 Global Step: 82700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:52,861-Speed 3440.91 samples/sec Loss 4.7852 LearningRate 0.0239 Epoch: 16 Global Step: 82710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:55,855-Speed 3420.81 samples/sec Loss 4.7521 LearningRate 0.0239 Epoch: 16 Global Step: 82720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:28:58,828-Speed 3445.24 samples/sec Loss 4.7559 LearningRate 0.0239 Epoch: 16 Global Step: 82730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:29:01,853-Speed 3386.01 samples/sec Loss 4.6916 LearningRate 0.0239 Epoch: 16 Global Step: 82740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:29:04,853-Speed 3415.09 samples/sec Loss 4.6525 LearningRate 0.0239 Epoch: 16 Global Step: 82750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:29:07,824-Speed 3447.52 samples/sec Loss 4.6186 LearningRate 0.0239 Epoch: 16 Global Step: 82760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:29:10,866-Speed 3367.13 samples/sec Loss 4.7524 LearningRate 0.0239 Epoch: 16 Global Step: 82770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:29:13,883-Speed 3394.60 samples/sec Loss 4.7161 LearningRate 0.0239 Epoch: 16 Global Step: 82780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:29:16,860-Speed 3440.78 samples/sec Loss 4.7903 LearningRate 0.0238 Epoch: 16 Global Step: 82790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:29:19,837-Speed 3440.70 samples/sec Loss 4.7104 LearningRate 0.0238 Epoch: 16 Global Step: 82800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:29:22,811-Speed 3444.05 samples/sec Loss 4.6386 LearningRate 0.0238 Epoch: 16 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:29:25,787-Speed 3441.46 samples/sec Loss 4.7969 LearningRate 0.0238 Epoch: 16 Global Step: 82820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:29:28,771-Speed 3433.62 samples/sec Loss 4.7339 LearningRate 0.0238 Epoch: 16 Global Step: 82830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:29:31,813-Speed 3367.45 samples/sec Loss 4.8350 LearningRate 0.0238 Epoch: 16 Global Step: 82840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:29:34,770-Speed 3462.81 samples/sec Loss 4.6804 LearningRate 0.0238 Epoch: 16 Global Step: 82850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:29:37,746-Speed 3442.46 samples/sec Loss 4.8180 LearningRate 0.0238 Epoch: 16 Global Step: 82860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:29:40,725-Speed 3437.82 samples/sec Loss 4.8955 LearningRate 0.0238 Epoch: 16 Global Step: 82870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:29:43,718-Speed 3422.04 samples/sec Loss 4.8474 LearningRate 0.0237 Epoch: 16 Global Step: 82880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:29:46,700-Speed 3435.48 samples/sec Loss 4.7631 LearningRate 0.0237 Epoch: 16 Global Step: 82890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:29:49,685-Speed 3431.70 samples/sec Loss 4.7189 LearningRate 0.0237 Epoch: 16 Global Step: 82900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:29:52,717-Speed 3377.67 samples/sec Loss 4.7959 LearningRate 0.0237 Epoch: 16 Global Step: 82910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:29:55,697-Speed 3437.69 samples/sec Loss 4.7929 LearningRate 0.0237 Epoch: 16 Global Step: 82920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:29:58,675-Speed 3439.51 samples/sec Loss 4.6513 LearningRate 0.0237 Epoch: 16 Global Step: 82930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:01,651-Speed 3441.65 samples/sec Loss 4.8041 LearningRate 0.0237 Epoch: 16 Global Step: 82940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:04,650-Speed 3415.63 samples/sec Loss 4.7673 LearningRate 0.0237 Epoch: 16 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:30:07,628-Speed 3439.77 samples/sec Loss 4.7333 LearningRate 0.0237 Epoch: 16 Global Step: 82960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:30:10,609-Speed 3435.72 samples/sec Loss 4.7959 LearningRate 0.0236 Epoch: 16 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:30:13,597-Speed 3427.95 samples/sec Loss 4.9526 LearningRate 0.0236 Epoch: 16 Global Step: 82980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:30:16,581-Speed 3433.04 samples/sec Loss 4.9025 LearningRate 0.0236 Epoch: 16 Global Step: 82990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:30:19,556-Speed 3442.08 samples/sec Loss 4.6425 LearningRate 0.0236 Epoch: 16 Global Step: 83000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:30:22,545-Speed 3428.02 samples/sec Loss 4.6092 LearningRate 0.0236 Epoch: 16 Global Step: 83010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:30:25,548-Speed 3410.03 samples/sec Loss 4.9572 LearningRate 0.0236 Epoch: 16 Global Step: 83020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:28,549-Speed 3413.63 samples/sec Loss 4.8192 LearningRate 0.0236 Epoch: 16 Global Step: 83030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:31,525-Speed 3441.80 samples/sec Loss 4.8048 LearningRate 0.0236 Epoch: 16 Global Step: 83040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:34,557-Speed 3378.06 samples/sec Loss 4.7592 LearningRate 0.0236 Epoch: 16 Global Step: 83050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:37,544-Speed 3428.97 samples/sec Loss 4.7948 LearningRate 0.0235 Epoch: 16 Global Step: 83060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:40,526-Speed 3435.18 samples/sec Loss 4.6646 LearningRate 0.0235 Epoch: 16 Global Step: 83070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:43,517-Speed 3423.81 samples/sec Loss 4.7572 LearningRate 0.0235 Epoch: 16 Global Step: 83080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:46,543-Speed 3385.63 samples/sec Loss 4.7863 LearningRate 0.0235 Epoch: 16 Global Step: 83090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:49,558-Speed 3397.51 samples/sec Loss 4.8060 LearningRate 0.0235 Epoch: 16 Global Step: 83100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:52,595-Speed 3372.70 samples/sec Loss 4.7284 LearningRate 0.0235 Epoch: 16 Global Step: 83110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:30:55,671-Speed 3330.12 samples/sec Loss 4.6940 LearningRate 0.0235 Epoch: 16 Global Step: 83120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:30:58,649-Speed 3438.51 samples/sec Loss 4.6816 LearningRate 0.0235 Epoch: 16 Global Step: 83130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:01,667-Speed 3394.84 samples/sec Loss 4.7197 LearningRate 0.0235 Epoch: 16 Global Step: 83140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:04,645-Speed 3438.55 samples/sec Loss 4.7055 LearningRate 0.0235 Epoch: 16 Global Step: 83150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:07,631-Speed 3430.58 samples/sec Loss 4.6905 LearningRate 0.0234 Epoch: 16 Global Step: 83160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:10,666-Speed 3374.46 samples/sec Loss 4.7014 LearningRate 0.0234 Epoch: 16 Global Step: 83170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:13,674-Speed 3405.46 samples/sec Loss 4.7248 LearningRate 0.0234 Epoch: 16 Global Step: 83180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:16,811-Speed 3265.98 samples/sec Loss 4.8069 LearningRate 0.0234 Epoch: 16 Global Step: 83190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:19,917-Speed 3297.57 samples/sec Loss 4.7382 LearningRate 0.0234 Epoch: 16 Global Step: 83200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:22,908-Speed 3425.43 samples/sec Loss 4.7974 LearningRate 0.0234 Epoch: 16 Global Step: 83210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:25,892-Speed 3432.40 samples/sec Loss 4.9190 LearningRate 0.0234 Epoch: 16 Global Step: 83220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:28,869-Speed 3439.61 samples/sec Loss 4.7233 LearningRate 0.0234 Epoch: 16 Global Step: 83230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:31,850-Speed 3436.99 samples/sec Loss 4.7211 LearningRate 0.0234 Epoch: 16 Global Step: 83240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:34,860-Speed 3402.99 samples/sec Loss 4.5500 LearningRate 0.0233 Epoch: 16 Global Step: 83250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:37,845-Speed 3431.59 samples/sec Loss 4.7390 LearningRate 0.0233 Epoch: 16 Global Step: 83260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:40,823-Speed 3439.29 samples/sec Loss 4.7509 LearningRate 0.0233 Epoch: 16 Global Step: 83270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:43,804-Speed 3435.27 samples/sec Loss 4.7788 LearningRate 0.0233 Epoch: 16 Global Step: 83280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:46,852-Speed 3361.48 samples/sec Loss 4.7653 LearningRate 0.0233 Epoch: 16 Global Step: 83290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:49,881-Speed 3381.40 samples/sec Loss 4.7988 LearningRate 0.0233 Epoch: 16 Global Step: 83300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:52,898-Speed 3395.09 samples/sec Loss 4.7830 LearningRate 0.0233 Epoch: 16 Global Step: 83310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:31:55,974-Speed 3329.27 samples/sec Loss 4.7213 LearningRate 0.0233 Epoch: 16 Global Step: 83320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:31:58,942-Speed 3451.42 samples/sec Loss 4.6823 LearningRate 0.0233 Epoch: 16 Global Step: 83330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:01,927-Speed 3431.00 samples/sec Loss 4.7558 LearningRate 0.0232 Epoch: 16 Global Step: 83340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:04,955-Speed 3382.81 samples/sec Loss 4.8102 LearningRate 0.0232 Epoch: 16 Global Step: 83350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:07,948-Speed 3422.48 samples/sec Loss 4.6766 LearningRate 0.0232 Epoch: 16 Global Step: 83360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:10,969-Speed 3390.57 samples/sec Loss 4.7459 LearningRate 0.0232 Epoch: 16 Global Step: 83370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:13,954-Speed 3430.63 samples/sec Loss 4.6803 LearningRate 0.0232 Epoch: 16 Global Step: 83380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:16,945-Speed 3425.00 samples/sec Loss 4.6032 LearningRate 0.0232 Epoch: 16 Global Step: 83390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:19,949-Speed 3410.23 samples/sec Loss 4.8492 LearningRate 0.0232 Epoch: 16 Global Step: 83400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:22,961-Speed 3400.08 samples/sec Loss 4.7645 LearningRate 0.0232 Epoch: 16 Global Step: 83410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:25,944-Speed 3433.70 samples/sec Loss 4.6748 LearningRate 0.0232 Epoch: 16 Global Step: 83420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:28,913-Speed 3450.82 samples/sec Loss 4.8125 LearningRate 0.0231 Epoch: 16 Global Step: 83430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:31,898-Speed 3430.92 samples/sec Loss 4.7766 LearningRate 0.0231 Epoch: 16 Global Step: 83440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:34,886-Speed 3427.86 samples/sec Loss 4.6099 LearningRate 0.0231 Epoch: 16 Global Step: 83450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:37,875-Speed 3437.41 samples/sec Loss 4.7475 LearningRate 0.0231 Epoch: 16 Global Step: 83460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:40,857-Speed 3435.43 samples/sec Loss 4.7848 LearningRate 0.0231 Epoch: 16 Global Step: 83470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:43,835-Speed 3440.10 samples/sec Loss 4.8091 LearningRate 0.0231 Epoch: 16 Global Step: 83480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:46,816-Speed 3436.19 samples/sec Loss 4.6708 LearningRate 0.0231 Epoch: 16 Global Step: 83490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:32:49,827-Speed 3401.43 samples/sec Loss 4.7846 LearningRate 0.0231 Epoch: 16 Global Step: 83500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:32:52,858-Speed 3379.51 samples/sec Loss 4.7714 LearningRate 0.0231 Epoch: 16 Global Step: 83510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:32:55,978-Speed 3282.17 samples/sec Loss 4.6895 LearningRate 0.0231 Epoch: 16 Global Step: 83520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:32:58,999-Speed 3390.60 samples/sec Loss 4.6965 LearningRate 0.0230 Epoch: 16 Global Step: 83530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:33:01,996-Speed 3417.50 samples/sec Loss 4.6574 LearningRate 0.0230 Epoch: 16 Global Step: 83540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:33:04,981-Speed 3431.90 samples/sec Loss 4.6445 LearningRate 0.0230 Epoch: 16 Global Step: 83550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:33:08,007-Speed 3385.12 samples/sec Loss 4.8461 LearningRate 0.0230 Epoch: 16 Global Step: 83560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:33:10,995-Speed 3428.65 samples/sec Loss 4.7749 LearningRate 0.0230 Epoch: 16 Global Step: 83570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:33:14,088-Speed 3310.73 samples/sec Loss 4.8314 LearningRate 0.0230 Epoch: 16 Global Step: 83580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:33:17,082-Speed 3421.20 samples/sec Loss 4.8119 LearningRate 0.0230 Epoch: 16 Global Step: 83590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:33:20,075-Speed 3422.27 samples/sec Loss 4.8957 LearningRate 0.0230 Epoch: 16 Global Step: 83600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:23,119-Speed 3365.43 samples/sec Loss 4.6575 LearningRate 0.0230 Epoch: 16 Global Step: 83610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:26,161-Speed 3367.01 samples/sec Loss 4.7304 LearningRate 0.0229 Epoch: 16 Global Step: 83620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:29,152-Speed 3423.94 samples/sec Loss 4.6054 LearningRate 0.0229 Epoch: 16 Global Step: 83630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:32,138-Speed 3430.76 samples/sec Loss 4.7197 LearningRate 0.0229 Epoch: 16 Global Step: 83640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:35,162-Speed 3387.00 samples/sec Loss 4.8180 LearningRate 0.0229 Epoch: 16 Global Step: 83650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:38,159-Speed 3418.48 samples/sec Loss 4.8513 LearningRate 0.0229 Epoch: 16 Global Step: 83660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:41,189-Speed 3379.56 samples/sec Loss 4.8104 LearningRate 0.0229 Epoch: 16 Global Step: 83670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:44,221-Speed 3377.82 samples/sec Loss 4.7101 LearningRate 0.0229 Epoch: 16 Global Step: 83680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:47,303-Speed 3324.42 samples/sec Loss 4.6311 LearningRate 0.0229 Epoch: 16 Global Step: 83690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:50,428-Speed 3277.30 samples/sec Loss 4.5364 LearningRate 0.0229 Epoch: 16 Global Step: 83700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:33:53,464-Speed 3373.98 samples/sec Loss 4.7302 LearningRate 0.0228 Epoch: 16 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:56,475-Speed 3402.22 samples/sec Loss 4.7755 LearningRate 0.0228 Epoch: 16 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:33:59,529-Speed 3352.79 samples/sec Loss 4.7159 LearningRate 0.0228 Epoch: 16 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:34:02,679-Speed 3253.23 samples/sec Loss 4.7474 LearningRate 0.0228 Epoch: 16 Global Step: 83740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:34:05,704-Speed 3386.08 samples/sec Loss 4.7770 LearningRate 0.0228 Epoch: 16 Global Step: 83750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:34:08,750-Speed 3362.62 samples/sec Loss 4.6942 LearningRate 0.0228 Epoch: 16 Global Step: 83760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:34:11,743-Speed 3422.37 samples/sec Loss 4.6971 LearningRate 0.0228 Epoch: 16 Global Step: 83770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:34:14,733-Speed 3424.63 samples/sec Loss 4.8193 LearningRate 0.0228 Epoch: 16 Global Step: 83780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:34:17,784-Speed 3357.71 samples/sec Loss 4.7969 LearningRate 0.0228 Epoch: 16 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:34:20,867-Speed 3322.53 samples/sec Loss 4.8111 LearningRate 0.0228 Epoch: 16 Global Step: 83800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:34:23,912-Speed 3363.45 samples/sec Loss 4.6702 LearningRate 0.0227 Epoch: 16 Global Step: 83810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:34:26,909-Speed 3417.58 samples/sec Loss 4.6870 LearningRate 0.0227 Epoch: 16 Global Step: 83820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:34:29,904-Speed 3420.06 samples/sec Loss 4.7755 LearningRate 0.0227 Epoch: 16 Global Step: 83830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:34:32,898-Speed 3422.23 samples/sec Loss 4.6867 LearningRate 0.0227 Epoch: 16 Global Step: 83840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:34:35,881-Speed 3433.34 samples/sec Loss 4.7486 LearningRate 0.0227 Epoch: 16 Global Step: 83850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:34:38,903-Speed 3390.58 samples/sec Loss 4.6993 LearningRate 0.0227 Epoch: 16 Global Step: 83860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:34:41,925-Speed 3389.16 samples/sec Loss 4.6516 LearningRate 0.0227 Epoch: 16 Global Step: 83870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:34:44,918-Speed 3421.70 samples/sec Loss 4.8082 LearningRate 0.0227 Epoch: 16 Global Step: 83880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:34:47,934-Speed 3395.73 samples/sec Loss 4.8157 LearningRate 0.0227 Epoch: 16 Global Step: 83890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:34:50,979-Speed 3364.61 samples/sec Loss 4.7747 LearningRate 0.0226 Epoch: 16 Global Step: 83900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:34:54,018-Speed 3370.41 samples/sec Loss 4.7092 LearningRate 0.0226 Epoch: 16 Global Step: 83910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:34:57,018-Speed 3414.50 samples/sec Loss 4.7547 LearningRate 0.0226 Epoch: 16 Global Step: 83920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:35:00,016-Speed 3416.02 samples/sec Loss 4.7119 LearningRate 0.0226 Epoch: 16 Global Step: 83930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:35:03,018-Speed 3412.45 samples/sec Loss 4.7458 LearningRate 0.0226 Epoch: 16 Global Step: 83940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:35:06,003-Speed 3431.38 samples/sec Loss 4.6922 LearningRate 0.0226 Epoch: 16 Global Step: 83950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:35:08,987-Speed 3432.57 samples/sec Loss 4.7443 LearningRate 0.0226 Epoch: 16 Global Step: 83960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:35:11,973-Speed 3430.74 samples/sec Loss 4.7433 LearningRate 0.0226 Epoch: 16 Global Step: 83970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:35:14,962-Speed 3425.75 samples/sec Loss 4.7511 LearningRate 0.0226 Epoch: 16 Global Step: 83980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:35:17,948-Speed 3430.71 samples/sec Loss 4.7336 LearningRate 0.0226 Epoch: 16 Global Step: 83990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:35:20,948-Speed 3414.62 samples/sec Loss 4.8351 LearningRate 0.0225 Epoch: 16 Global Step: 84000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:36:03,764-[lfw][84000]XNorm: 23.383379 Training: 2022-01-20 01:36:03,764-[lfw][84000]Accuracy-Flip: 0.99733+-0.00359 Training: 2022-01-20 01:36:03,765-[lfw][84000]Accuracy-Highest: 0.99817 Training: 2022-01-20 01:36:53,667-[cfp_fp][84000]XNorm: 21.173532 Training: 2022-01-20 01:36:53,668-[cfp_fp][84000]Accuracy-Flip: 0.98043+-0.00876 Training: 2022-01-20 01:36:53,668-[cfp_fp][84000]Accuracy-Highest: 0.98043 Training: 2022-01-20 01:37:36,700-[agedb_30][84000]XNorm: 22.966796 Training: 2022-01-20 01:37:36,701-[agedb_30][84000]Accuracy-Flip: 0.98000+-0.00592 Training: 2022-01-20 01:37:36,702-[agedb_30][84000]Accuracy-Highest: 0.98100 Training: 2022-01-20 01:37:39,679-Speed 73.81 samples/sec Loss 4.8233 LearningRate 0.0225 Epoch: 16 Global Step: 84010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:37:42,654-Speed 3443.37 samples/sec Loss 4.7554 LearningRate 0.0225 Epoch: 16 Global Step: 84020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:37:45,626-Speed 3446.10 samples/sec Loss 4.7434 LearningRate 0.0225 Epoch: 16 Global Step: 84030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:37:48,641-Speed 3396.84 samples/sec Loss 4.6808 LearningRate 0.0225 Epoch: 16 Global Step: 84040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:37:51,614-Speed 3445.56 samples/sec Loss 4.8543 LearningRate 0.0225 Epoch: 16 Global Step: 84050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:37:54,626-Speed 3401.31 samples/sec Loss 4.7659 LearningRate 0.0225 Epoch: 16 Global Step: 84060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:37:57,607-Speed 3435.75 samples/sec Loss 4.6803 LearningRate 0.0225 Epoch: 16 Global Step: 84070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:00,610-Speed 3410.05 samples/sec Loss 4.7461 LearningRate 0.0225 Epoch: 16 Global Step: 84080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:03,660-Speed 3358.63 samples/sec Loss 4.6137 LearningRate 0.0224 Epoch: 16 Global Step: 84090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:06,728-Speed 3338.37 samples/sec Loss 4.8485 LearningRate 0.0224 Epoch: 16 Global Step: 84100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:09,838-Speed 3293.75 samples/sec Loss 4.7983 LearningRate 0.0224 Epoch: 16 Global Step: 84110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:38:12,869-Speed 3379.63 samples/sec Loss 4.5769 LearningRate 0.0224 Epoch: 16 Global Step: 84120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:15,863-Speed 3421.21 samples/sec Loss 4.6746 LearningRate 0.0224 Epoch: 16 Global Step: 84130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:18,849-Speed 3431.00 samples/sec Loss 4.7691 LearningRate 0.0224 Epoch: 16 Global Step: 84140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:21,859-Speed 3403.35 samples/sec Loss 4.7738 LearningRate 0.0224 Epoch: 16 Global Step: 84150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:24,918-Speed 3348.16 samples/sec Loss 4.7971 LearningRate 0.0224 Epoch: 16 Global Step: 84160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:27,897-Speed 3437.84 samples/sec Loss 4.7408 LearningRate 0.0224 Epoch: 16 Global Step: 84170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:30,888-Speed 3424.26 samples/sec Loss 4.6384 LearningRate 0.0223 Epoch: 16 Global Step: 84180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:33,885-Speed 3418.26 samples/sec Loss 4.8274 LearningRate 0.0223 Epoch: 16 Global Step: 84190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:36,878-Speed 3422.52 samples/sec Loss 4.8021 LearningRate 0.0223 Epoch: 16 Global Step: 84200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:39,872-Speed 3420.91 samples/sec Loss 4.7191 LearningRate 0.0223 Epoch: 16 Global Step: 84210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:42,873-Speed 3413.16 samples/sec Loss 4.6978 LearningRate 0.0223 Epoch: 16 Global Step: 84220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:38:45,836-Speed 3456.36 samples/sec Loss 4.7954 LearningRate 0.0223 Epoch: 16 Global Step: 84230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:48,841-Speed 3408.92 samples/sec Loss 4.7940 LearningRate 0.0223 Epoch: 16 Global Step: 84240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:51,817-Speed 3442.45 samples/sec Loss 4.6567 LearningRate 0.0223 Epoch: 16 Global Step: 84250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:54,813-Speed 3418.60 samples/sec Loss 4.7869 LearningRate 0.0223 Epoch: 16 Global Step: 84260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:38:57,788-Speed 3442.65 samples/sec Loss 4.7544 LearningRate 0.0223 Epoch: 16 Global Step: 84270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:39:00,782-Speed 3421.83 samples/sec Loss 4.7196 LearningRate 0.0222 Epoch: 16 Global Step: 84280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:39:03,769-Speed 3437.71 samples/sec Loss 4.6419 LearningRate 0.0222 Epoch: 16 Global Step: 84290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:39:06,745-Speed 3440.87 samples/sec Loss 4.6928 LearningRate 0.0222 Epoch: 16 Global Step: 84300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:39:09,736-Speed 3424.68 samples/sec Loss 4.6271 LearningRate 0.0222 Epoch: 16 Global Step: 84310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:39:12,732-Speed 3419.23 samples/sec Loss 4.6965 LearningRate 0.0222 Epoch: 16 Global Step: 84320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:39:15,733-Speed 3413.20 samples/sec Loss 4.7771 LearningRate 0.0222 Epoch: 16 Global Step: 84330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:39:18,729-Speed 3418.75 samples/sec Loss 4.7980 LearningRate 0.0222 Epoch: 16 Global Step: 84340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:39:21,737-Speed 3404.91 samples/sec Loss 4.6498 LearningRate 0.0222 Epoch: 16 Global Step: 84350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:39:24,722-Speed 3432.11 samples/sec Loss 4.6669 LearningRate 0.0222 Epoch: 16 Global Step: 84360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:39:27,701-Speed 3437.72 samples/sec Loss 4.7044 LearningRate 0.0221 Epoch: 16 Global Step: 84370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:39:30,685-Speed 3432.18 samples/sec Loss 4.7323 LearningRate 0.0221 Epoch: 16 Global Step: 84380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:39:33,668-Speed 3434.75 samples/sec Loss 4.7035 LearningRate 0.0221 Epoch: 16 Global Step: 84390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:39:36,643-Speed 3442.15 samples/sec Loss 4.6552 LearningRate 0.0221 Epoch: 16 Global Step: 84400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:39:39,617-Speed 3444.11 samples/sec Loss 4.6958 LearningRate 0.0221 Epoch: 16 Global Step: 84410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:39:42,600-Speed 3434.55 samples/sec Loss 4.7724 LearningRate 0.0221 Epoch: 16 Global Step: 84420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:39:45,579-Speed 3438.02 samples/sec Loss 4.7629 LearningRate 0.0221 Epoch: 16 Global Step: 84430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:39:48,574-Speed 3420.27 samples/sec Loss 4.7021 LearningRate 0.0221 Epoch: 16 Global Step: 84440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:39:51,647-Speed 3332.66 samples/sec Loss 4.7725 LearningRate 0.0221 Epoch: 16 Global Step: 84450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:39:54,626-Speed 3438.25 samples/sec Loss 4.8008 LearningRate 0.0221 Epoch: 16 Global Step: 84460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:39:57,688-Speed 3345.38 samples/sec Loss 4.6548 LearningRate 0.0220 Epoch: 16 Global Step: 84470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:00,710-Speed 3388.80 samples/sec Loss 4.6961 LearningRate 0.0220 Epoch: 16 Global Step: 84480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:03,692-Speed 3435.32 samples/sec Loss 4.6278 LearningRate 0.0220 Epoch: 16 Global Step: 84490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:06,672-Speed 3437.74 samples/sec Loss 4.6519 LearningRate 0.0220 Epoch: 16 Global Step: 84500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:09,657-Speed 3432.02 samples/sec Loss 4.7604 LearningRate 0.0220 Epoch: 16 Global Step: 84510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:12,634-Speed 3440.03 samples/sec Loss 4.5670 LearningRate 0.0220 Epoch: 16 Global Step: 84520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:15,599-Speed 3455.00 samples/sec Loss 4.7220 LearningRate 0.0220 Epoch: 16 Global Step: 84530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:18,611-Speed 3400.07 samples/sec Loss 4.8585 LearningRate 0.0220 Epoch: 16 Global Step: 84540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:21,763-Speed 3249.29 samples/sec Loss 4.6103 LearningRate 0.0220 Epoch: 16 Global Step: 84550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:24,795-Speed 3378.53 samples/sec Loss 4.7833 LearningRate 0.0219 Epoch: 16 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:27,776-Speed 3436.49 samples/sec Loss 4.6662 LearningRate 0.0219 Epoch: 16 Global Step: 84570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:30,756-Speed 3436.29 samples/sec Loss 4.7391 LearningRate 0.0219 Epoch: 16 Global Step: 84580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:33,730-Speed 3444.21 samples/sec Loss 4.6782 LearningRate 0.0219 Epoch: 16 Global Step: 84590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:36,712-Speed 3435.80 samples/sec Loss 4.8425 LearningRate 0.0219 Epoch: 16 Global Step: 84600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:39,698-Speed 3429.65 samples/sec Loss 4.5956 LearningRate 0.0219 Epoch: 16 Global Step: 84610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:42,768-Speed 3336.25 samples/sec Loss 4.8215 LearningRate 0.0219 Epoch: 16 Global Step: 84620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:40:45,771-Speed 3411.16 samples/sec Loss 4.7343 LearningRate 0.0219 Epoch: 16 Global Step: 84630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:40:48,754-Speed 3433.82 samples/sec Loss 4.7933 LearningRate 0.0219 Epoch: 16 Global Step: 84640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:40:51,746-Speed 3423.30 samples/sec Loss 4.6748 LearningRate 0.0219 Epoch: 16 Global Step: 84650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:40:54,744-Speed 3416.20 samples/sec Loss 4.6596 LearningRate 0.0218 Epoch: 16 Global Step: 84660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:40:57,757-Speed 3399.14 samples/sec Loss 4.6881 LearningRate 0.0218 Epoch: 16 Global Step: 84670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:41:00,799-Speed 3367.39 samples/sec Loss 4.5972 LearningRate 0.0218 Epoch: 16 Global Step: 84680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:03,786-Speed 3429.93 samples/sec Loss 4.6096 LearningRate 0.0218 Epoch: 16 Global Step: 84690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:06,766-Speed 3437.01 samples/sec Loss 4.5537 LearningRate 0.0218 Epoch: 16 Global Step: 84700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:09,761-Speed 3420.01 samples/sec Loss 4.6389 LearningRate 0.0218 Epoch: 16 Global Step: 84710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:12,783-Speed 3388.86 samples/sec Loss 4.6417 LearningRate 0.0218 Epoch: 16 Global Step: 84720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:15,767-Speed 3432.92 samples/sec Loss 4.6926 LearningRate 0.0218 Epoch: 16 Global Step: 84730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:18,748-Speed 3436.17 samples/sec Loss 4.6018 LearningRate 0.0218 Epoch: 16 Global Step: 84740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:21,745-Speed 3417.31 samples/sec Loss 4.6757 LearningRate 0.0218 Epoch: 16 Global Step: 84750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:24,788-Speed 3366.20 samples/sec Loss 4.6310 LearningRate 0.0217 Epoch: 16 Global Step: 84760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:27,770-Speed 3435.23 samples/sec Loss 4.7016 LearningRate 0.0217 Epoch: 16 Global Step: 84770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:30,779-Speed 3403.78 samples/sec Loss 4.7127 LearningRate 0.0217 Epoch: 16 Global Step: 84780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:41:33,752-Speed 3445.52 samples/sec Loss 4.6430 LearningRate 0.0217 Epoch: 16 Global Step: 84790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:41:36,718-Speed 3453.63 samples/sec Loss 4.6239 LearningRate 0.0217 Epoch: 16 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:39,697-Speed 3437.70 samples/sec Loss 4.8048 LearningRate 0.0217 Epoch: 16 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:42,757-Speed 3347.96 samples/sec Loss 4.6479 LearningRate 0.0217 Epoch: 16 Global Step: 84820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:45,764-Speed 3405.43 samples/sec Loss 4.7541 LearningRate 0.0217 Epoch: 16 Global Step: 84830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:48,776-Speed 3402.07 samples/sec Loss 4.6446 LearningRate 0.0217 Epoch: 16 Global Step: 84840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:51,763-Speed 3428.66 samples/sec Loss 4.6401 LearningRate 0.0216 Epoch: 16 Global Step: 84850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:54,773-Speed 3402.63 samples/sec Loss 4.7385 LearningRate 0.0216 Epoch: 16 Global Step: 84860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:41:57,752-Speed 3437.95 samples/sec Loss 4.6891 LearningRate 0.0216 Epoch: 16 Global Step: 84870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:00,743-Speed 3424.65 samples/sec Loss 4.6671 LearningRate 0.0216 Epoch: 16 Global Step: 84880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:03,721-Speed 3440.40 samples/sec Loss 4.5971 LearningRate 0.0216 Epoch: 16 Global Step: 84890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:06,745-Speed 3387.12 samples/sec Loss 4.7178 LearningRate 0.0216 Epoch: 16 Global Step: 84900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:42:09,704-Speed 3461.98 samples/sec Loss 4.6109 LearningRate 0.0216 Epoch: 16 Global Step: 84910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:12,679-Speed 3442.28 samples/sec Loss 4.8448 LearningRate 0.0216 Epoch: 16 Global Step: 84920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:15,659-Speed 3436.74 samples/sec Loss 4.8994 LearningRate 0.0216 Epoch: 16 Global Step: 84930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:18,640-Speed 3436.48 samples/sec Loss 4.5755 LearningRate 0.0216 Epoch: 16 Global Step: 84940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:21,636-Speed 3419.06 samples/sec Loss 4.6454 LearningRate 0.0215 Epoch: 16 Global Step: 84950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:24,767-Speed 3271.25 samples/sec Loss 4.6863 LearningRate 0.0215 Epoch: 16 Global Step: 84960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:27,880-Speed 3289.81 samples/sec Loss 4.6477 LearningRate 0.0215 Epoch: 16 Global Step: 84970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:30,883-Speed 3411.89 samples/sec Loss 4.7516 LearningRate 0.0215 Epoch: 16 Global Step: 84980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:33,866-Speed 3433.77 samples/sec Loss 4.6805 LearningRate 0.0215 Epoch: 16 Global Step: 84990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:36,841-Speed 3442.81 samples/sec Loss 4.6865 LearningRate 0.0215 Epoch: 16 Global Step: 85000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:39,839-Speed 3416.58 samples/sec Loss 4.6568 LearningRate 0.0215 Epoch: 16 Global Step: 85010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:42:42,802-Speed 3456.24 samples/sec Loss 4.6889 LearningRate 0.0215 Epoch: 16 Global Step: 85020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:45,780-Speed 3439.83 samples/sec Loss 4.5897 LearningRate 0.0215 Epoch: 16 Global Step: 85030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:48,764-Speed 3431.77 samples/sec Loss 4.5674 LearningRate 0.0214 Epoch: 16 Global Step: 85040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:51,761-Speed 3417.72 samples/sec Loss 4.5504 LearningRate 0.0214 Epoch: 16 Global Step: 85050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:54,744-Speed 3434.71 samples/sec Loss 4.5849 LearningRate 0.0214 Epoch: 16 Global Step: 85060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:42:57,725-Speed 3435.74 samples/sec Loss 4.6020 LearningRate 0.0214 Epoch: 16 Global Step: 85070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:00,705-Speed 3436.63 samples/sec Loss 4.6743 LearningRate 0.0214 Epoch: 16 Global Step: 85080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:03,681-Speed 3442.16 samples/sec Loss 4.6404 LearningRate 0.0214 Epoch: 16 Global Step: 85090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:06,658-Speed 3440.54 samples/sec Loss 4.6672 LearningRate 0.0214 Epoch: 16 Global Step: 85100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:09,638-Speed 3437.51 samples/sec Loss 4.6845 LearningRate 0.0214 Epoch: 16 Global Step: 85110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:12,615-Speed 3439.89 samples/sec Loss 4.5332 LearningRate 0.0214 Epoch: 16 Global Step: 85120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:43:15,601-Speed 3430.63 samples/sec Loss 4.7991 LearningRate 0.0214 Epoch: 16 Global Step: 85130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:43:18,649-Speed 3360.59 samples/sec Loss 4.6172 LearningRate 0.0213 Epoch: 16 Global Step: 85140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:21,648-Speed 3414.73 samples/sec Loss 4.6247 LearningRate 0.0213 Epoch: 16 Global Step: 85150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:24,673-Speed 3387.16 samples/sec Loss 4.8497 LearningRate 0.0213 Epoch: 16 Global Step: 85160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:27,660-Speed 3429.34 samples/sec Loss 4.7169 LearningRate 0.0213 Epoch: 16 Global Step: 85170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:30,640-Speed 3437.05 samples/sec Loss 4.7619 LearningRate 0.0213 Epoch: 16 Global Step: 85180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:33,638-Speed 3416.17 samples/sec Loss 4.5413 LearningRate 0.0213 Epoch: 16 Global Step: 85190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:36,626-Speed 3428.13 samples/sec Loss 4.6134 LearningRate 0.0213 Epoch: 16 Global Step: 85200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:39,631-Speed 3408.81 samples/sec Loss 4.4317 LearningRate 0.0213 Epoch: 16 Global Step: 85210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:42,635-Speed 3409.90 samples/sec Loss 4.7014 LearningRate 0.0213 Epoch: 16 Global Step: 85220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:45,617-Speed 3434.67 samples/sec Loss 4.7123 LearningRate 0.0213 Epoch: 16 Global Step: 85230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:48,608-Speed 3425.08 samples/sec Loss 4.5483 LearningRate 0.0212 Epoch: 16 Global Step: 85240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:43:51,597-Speed 3426.73 samples/sec Loss 4.6482 LearningRate 0.0212 Epoch: 16 Global Step: 85250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:43:54,585-Speed 3428.49 samples/sec Loss 4.6613 LearningRate 0.0212 Epoch: 16 Global Step: 85260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:43:57,569-Speed 3432.51 samples/sec Loss 4.6570 LearningRate 0.0212 Epoch: 16 Global Step: 85270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:44:00,554-Speed 3430.21 samples/sec Loss 4.5812 LearningRate 0.0212 Epoch: 16 Global Step: 85280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:44:03,536-Speed 3435.33 samples/sec Loss 4.6005 LearningRate 0.0212 Epoch: 16 Global Step: 85290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:44:06,541-Speed 3408.14 samples/sec Loss 4.6260 LearningRate 0.0212 Epoch: 16 Global Step: 85300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:44:09,542-Speed 3413.49 samples/sec Loss 4.6692 LearningRate 0.0212 Epoch: 16 Global Step: 85310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:44:12,587-Speed 3364.06 samples/sec Loss 4.5578 LearningRate 0.0212 Epoch: 16 Global Step: 85320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:44:15,582-Speed 3419.61 samples/sec Loss 4.5490 LearningRate 0.0211 Epoch: 16 Global Step: 85330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:44:18,569-Speed 3429.93 samples/sec Loss 4.5616 LearningRate 0.0211 Epoch: 16 Global Step: 85340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:44:21,554-Speed 3431.81 samples/sec Loss 4.7419 LearningRate 0.0211 Epoch: 16 Global Step: 85350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:44:24,554-Speed 3414.64 samples/sec Loss 4.4767 LearningRate 0.0211 Epoch: 16 Global Step: 85360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:44:27,537-Speed 3433.40 samples/sec Loss 4.6311 LearningRate 0.0211 Epoch: 16 Global Step: 85370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:44:30,524-Speed 3429.02 samples/sec Loss 4.6904 LearningRate 0.0211 Epoch: 16 Global Step: 85380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:44:33,506-Speed 3434.51 samples/sec Loss 4.7646 LearningRate 0.0211 Epoch: 16 Global Step: 85390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:44:36,492-Speed 3429.92 samples/sec Loss 4.7055 LearningRate 0.0211 Epoch: 16 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:44:39,478-Speed 3431.39 samples/sec Loss 4.5309 LearningRate 0.0211 Epoch: 16 Global Step: 85410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:44:42,461-Speed 3432.61 samples/sec Loss 4.7096 LearningRate 0.0211 Epoch: 16 Global Step: 85420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:44:45,444-Speed 3434.71 samples/sec Loss 4.6878 LearningRate 0.0210 Epoch: 16 Global Step: 85430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:44:48,423-Speed 3438.62 samples/sec Loss 4.7787 LearningRate 0.0210 Epoch: 16 Global Step: 85440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:44:51,412-Speed 3425.75 samples/sec Loss 4.6749 LearningRate 0.0210 Epoch: 16 Global Step: 85450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:44:54,403-Speed 3425.84 samples/sec Loss 4.6339 LearningRate 0.0210 Epoch: 16 Global Step: 85460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:44:57,384-Speed 3435.90 samples/sec Loss 4.7209 LearningRate 0.0210 Epoch: 16 Global Step: 85470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:45:00,369-Speed 3432.35 samples/sec Loss 4.7048 LearningRate 0.0210 Epoch: 16 Global Step: 85480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:45:03,385-Speed 3396.10 samples/sec Loss 4.7787 LearningRate 0.0210 Epoch: 16 Global Step: 85490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:45:06,376-Speed 3424.38 samples/sec Loss 4.7164 LearningRate 0.0210 Epoch: 16 Global Step: 85500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:45:09,364-Speed 3427.65 samples/sec Loss 4.6113 LearningRate 0.0210 Epoch: 16 Global Step: 85510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:45:12,362-Speed 3417.00 samples/sec Loss 4.7255 LearningRate 0.0210 Epoch: 16 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:45:15,352-Speed 3425.34 samples/sec Loss 4.6812 LearningRate 0.0209 Epoch: 16 Global Step: 85530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:45:18,339-Speed 3429.52 samples/sec Loss 4.6363 LearningRate 0.0209 Epoch: 16 Global Step: 85540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:45:21,345-Speed 3407.21 samples/sec Loss 4.6000 LearningRate 0.0209 Epoch: 16 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:45:24,371-Speed 3384.82 samples/sec Loss 4.6947 LearningRate 0.0209 Epoch: 16 Global Step: 85560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:45:27,357-Speed 3430.08 samples/sec Loss 4.5456 LearningRate 0.0209 Epoch: 16 Global Step: 85570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:45:30,351-Speed 3422.05 samples/sec Loss 4.6538 LearningRate 0.0209 Epoch: 16 Global Step: 85580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:45:33,340-Speed 3426.50 samples/sec Loss 4.5385 LearningRate 0.0209 Epoch: 16 Global Step: 85590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:45:36,320-Speed 3436.40 samples/sec Loss 4.5993 LearningRate 0.0209 Epoch: 16 Global Step: 85600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:45:39,300-Speed 3437.10 samples/sec Loss 4.4814 LearningRate 0.0209 Epoch: 16 Global Step: 85610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:45:42,311-Speed 3401.83 samples/sec Loss 4.6525 LearningRate 0.0209 Epoch: 16 Global Step: 85620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:45:45,328-Speed 3396.32 samples/sec Loss 4.7761 LearningRate 0.0208 Epoch: 16 Global Step: 85630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:45:48,307-Speed 3437.16 samples/sec Loss 4.5811 LearningRate 0.0208 Epoch: 16 Global Step: 85640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:45:51,292-Speed 3431.13 samples/sec Loss 4.6471 LearningRate 0.0208 Epoch: 16 Global Step: 85650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:45:54,283-Speed 3424.88 samples/sec Loss 4.6215 LearningRate 0.0208 Epoch: 16 Global Step: 85660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:45:57,298-Speed 3397.90 samples/sec Loss 4.6692 LearningRate 0.0208 Epoch: 16 Global Step: 85670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:46:00,285-Speed 3428.81 samples/sec Loss 4.4323 LearningRate 0.0208 Epoch: 16 Global Step: 85680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:03,264-Speed 3438.16 samples/sec Loss 4.5996 LearningRate 0.0208 Epoch: 16 Global Step: 85690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:06,280-Speed 3395.72 samples/sec Loss 4.4667 LearningRate 0.0208 Epoch: 16 Global Step: 85700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:09,261-Speed 3437.24 samples/sec Loss 4.6119 LearningRate 0.0208 Epoch: 16 Global Step: 85710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:12,295-Speed 3375.36 samples/sec Loss 4.6892 LearningRate 0.0208 Epoch: 16 Global Step: 85720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:15,377-Speed 3323.81 samples/sec Loss 4.6343 LearningRate 0.0207 Epoch: 16 Global Step: 85730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:18,386-Speed 3404.20 samples/sec Loss 4.5568 LearningRate 0.0207 Epoch: 16 Global Step: 85740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:21,383-Speed 3416.81 samples/sec Loss 4.5069 LearningRate 0.0207 Epoch: 16 Global Step: 85750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:24,387-Speed 3410.45 samples/sec Loss 4.6331 LearningRate 0.0207 Epoch: 16 Global Step: 85760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:27,420-Speed 3376.62 samples/sec Loss 4.6182 LearningRate 0.0207 Epoch: 16 Global Step: 85770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:30,403-Speed 3433.79 samples/sec Loss 4.4053 LearningRate 0.0207 Epoch: 16 Global Step: 85780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:33,400-Speed 3417.89 samples/sec Loss 4.6435 LearningRate 0.0207 Epoch: 16 Global Step: 85790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:36,383-Speed 3434.01 samples/sec Loss 4.6905 LearningRate 0.0207 Epoch: 16 Global Step: 85800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:39,413-Speed 3381.28 samples/sec Loss 4.5397 LearningRate 0.0207 Epoch: 16 Global Step: 85810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:42,396-Speed 3432.46 samples/sec Loss 4.6633 LearningRate 0.0206 Epoch: 16 Global Step: 85820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:45,395-Speed 3416.24 samples/sec Loss 4.6591 LearningRate 0.0206 Epoch: 16 Global Step: 85830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:48,375-Speed 3436.59 samples/sec Loss 4.6371 LearningRate 0.0206 Epoch: 16 Global Step: 85840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:51,363-Speed 3428.26 samples/sec Loss 4.6579 LearningRate 0.0206 Epoch: 16 Global Step: 85850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:54,345-Speed 3435.17 samples/sec Loss 4.7527 LearningRate 0.0206 Epoch: 16 Global Step: 85860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:46:57,326-Speed 3435.74 samples/sec Loss 4.5671 LearningRate 0.0206 Epoch: 16 Global Step: 85870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:47:00,323-Speed 3417.18 samples/sec Loss 4.5943 LearningRate 0.0206 Epoch: 16 Global Step: 85880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:47:03,297-Speed 3444.04 samples/sec Loss 4.5985 LearningRate 0.0206 Epoch: 16 Global Step: 85890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:47:06,280-Speed 3434.37 samples/sec Loss 4.5362 LearningRate 0.0206 Epoch: 16 Global Step: 85900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:47:09,265-Speed 3431.79 samples/sec Loss 4.7794 LearningRate 0.0206 Epoch: 16 Global Step: 85910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:47:12,229-Speed 3455.62 samples/sec Loss 4.5960 LearningRate 0.0205 Epoch: 16 Global Step: 85920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:47:15,260-Speed 3378.87 samples/sec Loss 4.5849 LearningRate 0.0205 Epoch: 16 Global Step: 85930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:47:18,284-Speed 3387.52 samples/sec Loss 4.6214 LearningRate 0.0205 Epoch: 16 Global Step: 85940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:47:21,261-Speed 3439.53 samples/sec Loss 4.6078 LearningRate 0.0205 Epoch: 16 Global Step: 85950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:47:24,257-Speed 3419.42 samples/sec Loss 4.5754 LearningRate 0.0205 Epoch: 16 Global Step: 85960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:47:27,268-Speed 3401.41 samples/sec Loss 4.4491 LearningRate 0.0205 Epoch: 16 Global Step: 85970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:47:30,322-Speed 3354.19 samples/sec Loss 4.8358 LearningRate 0.0205 Epoch: 16 Global Step: 85980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:47:43,723-Speed 764.21 samples/sec Loss 4.2201 LearningRate 0.0205 Epoch: 17 Global Step: 85990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:47:46,742-Speed 3392.67 samples/sec Loss 3.8244 LearningRate 0.0205 Epoch: 17 Global Step: 86000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:48:30,010-[lfw][86000]XNorm: 22.711859 Training: 2022-01-20 01:48:30,011-[lfw][86000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-01-20 01:48:30,012-[lfw][86000]Accuracy-Highest: 0.99817 Training: 2022-01-20 01:49:20,027-[cfp_fp][86000]XNorm: 20.900128 Training: 2022-01-20 01:49:20,028-[cfp_fp][86000]Accuracy-Flip: 0.98186+-0.00667 Training: 2022-01-20 01:49:20,028-[cfp_fp][86000]Accuracy-Highest: 0.98186 Training: 2022-01-20 01:50:03,083-[agedb_30][86000]XNorm: 22.529005 Training: 2022-01-20 01:50:03,084-[agedb_30][86000]Accuracy-Flip: 0.98233+-0.00821 Training: 2022-01-20 01:50:03,084-[agedb_30][86000]Accuracy-Highest: 0.98233 Training: 2022-01-20 01:50:06,266-Speed 73.39 samples/sec Loss 3.7853 LearningRate 0.0205 Epoch: 17 Global Step: 86010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:50:09,250-Speed 3431.88 samples/sec Loss 3.6854 LearningRate 0.0204 Epoch: 17 Global Step: 86020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:50:12,230-Speed 3437.35 samples/sec Loss 3.8154 LearningRate 0.0204 Epoch: 17 Global Step: 86030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:50:15,203-Speed 3445.70 samples/sec Loss 3.6997 LearningRate 0.0204 Epoch: 17 Global Step: 86040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:50:18,196-Speed 3421.90 samples/sec Loss 3.8873 LearningRate 0.0204 Epoch: 17 Global Step: 86050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:50:21,241-Speed 3363.24 samples/sec Loss 3.8434 LearningRate 0.0204 Epoch: 17 Global Step: 86060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:50:24,230-Speed 3426.96 samples/sec Loss 3.8615 LearningRate 0.0204 Epoch: 17 Global Step: 86070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:50:27,286-Speed 3351.96 samples/sec Loss 3.8759 LearningRate 0.0204 Epoch: 17 Global Step: 86080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:50:30,291-Speed 3408.63 samples/sec Loss 3.8225 LearningRate 0.0204 Epoch: 17 Global Step: 86090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:50:33,277-Speed 3429.93 samples/sec Loss 3.8234 LearningRate 0.0204 Epoch: 17 Global Step: 86100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:50:36,297-Speed 3392.16 samples/sec Loss 3.8192 LearningRate 0.0204 Epoch: 17 Global Step: 86110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:50:39,323-Speed 3384.20 samples/sec Loss 3.8469 LearningRate 0.0203 Epoch: 17 Global Step: 86120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:50:42,309-Speed 3430.43 samples/sec Loss 3.9255 LearningRate 0.0203 Epoch: 17 Global Step: 86130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:50:45,372-Speed 3345.13 samples/sec Loss 3.8532 LearningRate 0.0203 Epoch: 17 Global Step: 86140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:50:48,520-Speed 3253.15 samples/sec Loss 3.8629 LearningRate 0.0203 Epoch: 17 Global Step: 86150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:50:51,507-Speed 3428.70 samples/sec Loss 3.9271 LearningRate 0.0203 Epoch: 17 Global Step: 86160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:50:55,042-Speed 2898.15 samples/sec Loss 3.7531 LearningRate 0.0203 Epoch: 17 Global Step: 86170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:50:58,686-Speed 2810.23 samples/sec Loss 3.7678 LearningRate 0.0203 Epoch: 17 Global Step: 86180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:02,114-Speed 2987.43 samples/sec Loss 3.8880 LearningRate 0.0203 Epoch: 17 Global Step: 86190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:05,122-Speed 3406.59 samples/sec Loss 3.9426 LearningRate 0.0203 Epoch: 17 Global Step: 86200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:08,121-Speed 3414.54 samples/sec Loss 3.7769 LearningRate 0.0203 Epoch: 17 Global Step: 86210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:11,241-Speed 3283.35 samples/sec Loss 3.8207 LearningRate 0.0202 Epoch: 17 Global Step: 86220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:14,257-Speed 3397.07 samples/sec Loss 3.8616 LearningRate 0.0202 Epoch: 17 Global Step: 86230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:17,324-Speed 3338.79 samples/sec Loss 3.9718 LearningRate 0.0202 Epoch: 17 Global Step: 86240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:20,374-Speed 3359.10 samples/sec Loss 4.0358 LearningRate 0.0202 Epoch: 17 Global Step: 86250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:23,384-Speed 3402.72 samples/sec Loss 3.9235 LearningRate 0.0202 Epoch: 17 Global Step: 86260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:51:26,361-Speed 3440.43 samples/sec Loss 3.8235 LearningRate 0.0202 Epoch: 17 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:29,345-Speed 3433.89 samples/sec Loss 3.9206 LearningRate 0.0202 Epoch: 17 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:32,327-Speed 3434.29 samples/sec Loss 3.9406 LearningRate 0.0202 Epoch: 17 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:35,345-Speed 3393.49 samples/sec Loss 3.9079 LearningRate 0.0202 Epoch: 17 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:38,338-Speed 3422.39 samples/sec Loss 3.9575 LearningRate 0.0202 Epoch: 17 Global Step: 86310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:41,325-Speed 3429.10 samples/sec Loss 3.9507 LearningRate 0.0201 Epoch: 17 Global Step: 86320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:44,319-Speed 3421.25 samples/sec Loss 3.9201 LearningRate 0.0201 Epoch: 17 Global Step: 86330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:47,302-Speed 3433.75 samples/sec Loss 3.9890 LearningRate 0.0201 Epoch: 17 Global Step: 86340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:50,310-Speed 3404.96 samples/sec Loss 3.9211 LearningRate 0.0201 Epoch: 17 Global Step: 86350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:53,305-Speed 3420.44 samples/sec Loss 3.9436 LearningRate 0.0201 Epoch: 17 Global Step: 86360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:51:56,320-Speed 3396.92 samples/sec Loss 3.9290 LearningRate 0.0201 Epoch: 17 Global Step: 86370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:51:59,388-Speed 3339.08 samples/sec Loss 3.9414 LearningRate 0.0201 Epoch: 17 Global Step: 86380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:52:02,482-Speed 3310.07 samples/sec Loss 3.9610 LearningRate 0.0201 Epoch: 17 Global Step: 86390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:52:05,475-Speed 3422.29 samples/sec Loss 3.8710 LearningRate 0.0201 Epoch: 17 Global Step: 86400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:52:08,460-Speed 3431.42 samples/sec Loss 3.9161 LearningRate 0.0201 Epoch: 17 Global Step: 86410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:52:11,441-Speed 3435.49 samples/sec Loss 3.9791 LearningRate 0.0200 Epoch: 17 Global Step: 86420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:52:14,471-Speed 3381.60 samples/sec Loss 3.9922 LearningRate 0.0200 Epoch: 17 Global Step: 86430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:52:17,452-Speed 3435.21 samples/sec Loss 3.8811 LearningRate 0.0200 Epoch: 17 Global Step: 86440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:52:20,513-Speed 3347.24 samples/sec Loss 3.9777 LearningRate 0.0200 Epoch: 17 Global Step: 86450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:52:23,524-Speed 3401.47 samples/sec Loss 4.0448 LearningRate 0.0200 Epoch: 17 Global Step: 86460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:52:26,509-Speed 3431.31 samples/sec Loss 3.9610 LearningRate 0.0200 Epoch: 17 Global Step: 86470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:52:29,580-Speed 3334.75 samples/sec Loss 4.0304 LearningRate 0.0200 Epoch: 17 Global Step: 86480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:52:32,640-Speed 3347.27 samples/sec Loss 4.0372 LearningRate 0.0200 Epoch: 17 Global Step: 86490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:52:35,624-Speed 3433.35 samples/sec Loss 4.0332 LearningRate 0.0200 Epoch: 17 Global Step: 86500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:52:38,616-Speed 3423.11 samples/sec Loss 3.9506 LearningRate 0.0200 Epoch: 17 Global Step: 86510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:52:41,578-Speed 3457.66 samples/sec Loss 4.1303 LearningRate 0.0199 Epoch: 17 Global Step: 86520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:52:44,642-Speed 3343.82 samples/sec Loss 4.1765 LearningRate 0.0199 Epoch: 17 Global Step: 86530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:52:47,698-Speed 3351.16 samples/sec Loss 4.0068 LearningRate 0.0199 Epoch: 17 Global Step: 86540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:52:50,683-Speed 3432.59 samples/sec Loss 4.0647 LearningRate 0.0199 Epoch: 17 Global Step: 86550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:52:53,659-Speed 3440.71 samples/sec Loss 4.0525 LearningRate 0.0199 Epoch: 17 Global Step: 86560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:52:56,638-Speed 3438.57 samples/sec Loss 4.1856 LearningRate 0.0199 Epoch: 17 Global Step: 86570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:52:59,618-Speed 3437.91 samples/sec Loss 4.1460 LearningRate 0.0199 Epoch: 17 Global Step: 86580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:53:02,605-Speed 3428.44 samples/sec Loss 4.0376 LearningRate 0.0199 Epoch: 17 Global Step: 86590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:53:05,581-Speed 3441.75 samples/sec Loss 4.0905 LearningRate 0.0199 Epoch: 17 Global Step: 86600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:53:08,577-Speed 3419.33 samples/sec Loss 4.0560 LearningRate 0.0199 Epoch: 17 Global Step: 86610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:53:11,583-Speed 3406.45 samples/sec Loss 4.1047 LearningRate 0.0198 Epoch: 17 Global Step: 86620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:14,559-Speed 3442.36 samples/sec Loss 4.1041 LearningRate 0.0198 Epoch: 17 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:17,596-Speed 3373.11 samples/sec Loss 3.9691 LearningRate 0.0198 Epoch: 17 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:20,634-Speed 3371.14 samples/sec Loss 3.9601 LearningRate 0.0198 Epoch: 17 Global Step: 86650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:23,611-Speed 3441.10 samples/sec Loss 4.0696 LearningRate 0.0198 Epoch: 17 Global Step: 86660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:26,588-Speed 3439.73 samples/sec Loss 4.1690 LearningRate 0.0198 Epoch: 17 Global Step: 86670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:29,578-Speed 3426.85 samples/sec Loss 4.1930 LearningRate 0.0198 Epoch: 17 Global Step: 86680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:32,563-Speed 3430.38 samples/sec Loss 4.0960 LearningRate 0.0198 Epoch: 17 Global Step: 86690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:35,565-Speed 3413.07 samples/sec Loss 4.1254 LearningRate 0.0198 Epoch: 17 Global Step: 86700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:38,622-Speed 3349.93 samples/sec Loss 4.1038 LearningRate 0.0198 Epoch: 17 Global Step: 86710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:41,625-Speed 3410.63 samples/sec Loss 4.1288 LearningRate 0.0197 Epoch: 17 Global Step: 86720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:53:44,587-Speed 3458.12 samples/sec Loss 4.2341 LearningRate 0.0197 Epoch: 17 Global Step: 86730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:47,719-Speed 3271.09 samples/sec Loss 3.9969 LearningRate 0.0197 Epoch: 17 Global Step: 86740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:50,698-Speed 3438.53 samples/sec Loss 4.0051 LearningRate 0.0197 Epoch: 17 Global Step: 86750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:53,694-Speed 3418.41 samples/sec Loss 4.1915 LearningRate 0.0197 Epoch: 17 Global Step: 86760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:56,686-Speed 3423.10 samples/sec Loss 4.1594 LearningRate 0.0197 Epoch: 17 Global Step: 86770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:53:59,674-Speed 3428.46 samples/sec Loss 4.1315 LearningRate 0.0197 Epoch: 17 Global Step: 86780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:54:02,677-Speed 3410.14 samples/sec Loss 4.0779 LearningRate 0.0197 Epoch: 17 Global Step: 86790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:54:05,851-Speed 3227.24 samples/sec Loss 4.1145 LearningRate 0.0197 Epoch: 17 Global Step: 86800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:54:08,903-Speed 3356.00 samples/sec Loss 4.1423 LearningRate 0.0197 Epoch: 17 Global Step: 86810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:54:11,888-Speed 3431.81 samples/sec Loss 4.1302 LearningRate 0.0196 Epoch: 17 Global Step: 86820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:54:14,877-Speed 3427.25 samples/sec Loss 4.1853 LearningRate 0.0196 Epoch: 17 Global Step: 86830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:54:17,845-Speed 3450.35 samples/sec Loss 4.1331 LearningRate 0.0196 Epoch: 17 Global Step: 86840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:54:20,834-Speed 3427.48 samples/sec Loss 4.2011 LearningRate 0.0196 Epoch: 17 Global Step: 86850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:54:23,802-Speed 3450.55 samples/sec Loss 4.1285 LearningRate 0.0196 Epoch: 17 Global Step: 86860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:54:26,779-Speed 3440.38 samples/sec Loss 4.1168 LearningRate 0.0196 Epoch: 17 Global Step: 86870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:54:29,802-Speed 3389.00 samples/sec Loss 4.1303 LearningRate 0.0196 Epoch: 17 Global Step: 86880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:54:32,788-Speed 3429.68 samples/sec Loss 4.1198 LearningRate 0.0196 Epoch: 17 Global Step: 86890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:54:35,776-Speed 3428.15 samples/sec Loss 3.9704 LearningRate 0.0196 Epoch: 17 Global Step: 86900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:54:38,760-Speed 3432.76 samples/sec Loss 4.0733 LearningRate 0.0196 Epoch: 17 Global Step: 86910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:54:41,747-Speed 3428.60 samples/sec Loss 4.0424 LearningRate 0.0195 Epoch: 17 Global Step: 86920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:54:44,746-Speed 3417.53 samples/sec Loss 4.1149 LearningRate 0.0195 Epoch: 17 Global Step: 86930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:54:47,725-Speed 3437.58 samples/sec Loss 4.0623 LearningRate 0.0195 Epoch: 17 Global Step: 86940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:54:50,713-Speed 3428.37 samples/sec Loss 4.1030 LearningRate 0.0195 Epoch: 17 Global Step: 86950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:54:53,700-Speed 3428.42 samples/sec Loss 4.0782 LearningRate 0.0195 Epoch: 17 Global Step: 86960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:54:56,693-Speed 3422.11 samples/sec Loss 4.1325 LearningRate 0.0195 Epoch: 17 Global Step: 86970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:54:59,698-Speed 3409.25 samples/sec Loss 4.2336 LearningRate 0.0195 Epoch: 17 Global Step: 86980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:02,683-Speed 3431.67 samples/sec Loss 4.1407 LearningRate 0.0195 Epoch: 17 Global Step: 86990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:05,662-Speed 3438.29 samples/sec Loss 4.0545 LearningRate 0.0195 Epoch: 17 Global Step: 87000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:08,654-Speed 3423.66 samples/sec Loss 4.1973 LearningRate 0.0195 Epoch: 17 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:11,634-Speed 3436.63 samples/sec Loss 4.1437 LearningRate 0.0194 Epoch: 17 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:14,620-Speed 3430.56 samples/sec Loss 4.0383 LearningRate 0.0194 Epoch: 17 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:17,635-Speed 3397.07 samples/sec Loss 4.1534 LearningRate 0.0194 Epoch: 17 Global Step: 87040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:20,665-Speed 3380.20 samples/sec Loss 4.2203 LearningRate 0.0194 Epoch: 17 Global Step: 87050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:23,660-Speed 3420.24 samples/sec Loss 4.1998 LearningRate 0.0194 Epoch: 17 Global Step: 87060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:55:26,640-Speed 3437.16 samples/sec Loss 4.1170 LearningRate 0.0194 Epoch: 17 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:29,661-Speed 3390.70 samples/sec Loss 4.1053 LearningRate 0.0194 Epoch: 17 Global Step: 87080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:32,645-Speed 3432.18 samples/sec Loss 4.1162 LearningRate 0.0194 Epoch: 17 Global Step: 87090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:35,625-Speed 3438.17 samples/sec Loss 4.1172 LearningRate 0.0194 Epoch: 17 Global Step: 87100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:38,628-Speed 3410.44 samples/sec Loss 4.0935 LearningRate 0.0194 Epoch: 17 Global Step: 87110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:41,637-Speed 3403.43 samples/sec Loss 4.2230 LearningRate 0.0193 Epoch: 17 Global Step: 87120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:55:44,610-Speed 3444.96 samples/sec Loss 4.2953 LearningRate 0.0193 Epoch: 17 Global Step: 87130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:55:47,599-Speed 3426.86 samples/sec Loss 4.1634 LearningRate 0.0193 Epoch: 17 Global Step: 87140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:55:50,585-Speed 3431.43 samples/sec Loss 4.2337 LearningRate 0.0193 Epoch: 17 Global Step: 87150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:55:53,568-Speed 3432.87 samples/sec Loss 4.1789 LearningRate 0.0193 Epoch: 17 Global Step: 87160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:55:56,549-Speed 3436.96 samples/sec Loss 4.1157 LearningRate 0.0193 Epoch: 17 Global Step: 87170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:55:59,549-Speed 3414.47 samples/sec Loss 4.2981 LearningRate 0.0193 Epoch: 17 Global Step: 87180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:56:02,623-Speed 3331.21 samples/sec Loss 4.0376 LearningRate 0.0193 Epoch: 17 Global Step: 87190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:56:05,627-Speed 3410.89 samples/sec Loss 4.1415 LearningRate 0.0193 Epoch: 17 Global Step: 87200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:56:08,612-Speed 3431.08 samples/sec Loss 4.2008 LearningRate 0.0193 Epoch: 17 Global Step: 87210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:56:11,638-Speed 3384.79 samples/sec Loss 4.2155 LearningRate 0.0192 Epoch: 17 Global Step: 87220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-20 01:56:14,631-Speed 3422.53 samples/sec Loss 4.1644 LearningRate 0.0192 Epoch: 17 Global Step: 87230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:17,671-Speed 3368.32 samples/sec Loss 4.0823 LearningRate 0.0192 Epoch: 17 Global Step: 87240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:20,753-Speed 3323.54 samples/sec Loss 4.0732 LearningRate 0.0192 Epoch: 17 Global Step: 87250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:23,821-Speed 3338.73 samples/sec Loss 4.2270 LearningRate 0.0192 Epoch: 17 Global Step: 87260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:26,859-Speed 3371.48 samples/sec Loss 4.2129 LearningRate 0.0192 Epoch: 17 Global Step: 87270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:29,960-Speed 3303.23 samples/sec Loss 4.1641 LearningRate 0.0192 Epoch: 17 Global Step: 87280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:32,947-Speed 3429.11 samples/sec Loss 4.0992 LearningRate 0.0192 Epoch: 17 Global Step: 87290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:35,934-Speed 3429.93 samples/sec Loss 4.3693 LearningRate 0.0192 Epoch: 17 Global Step: 87300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:38,929-Speed 3418.95 samples/sec Loss 4.1238 LearningRate 0.0192 Epoch: 17 Global Step: 87310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:41,945-Speed 3396.62 samples/sec Loss 4.1886 LearningRate 0.0192 Epoch: 17 Global Step: 87320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:44,948-Speed 3411.60 samples/sec Loss 4.2287 LearningRate 0.0191 Epoch: 17 Global Step: 87330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:56:47,944-Speed 3419.16 samples/sec Loss 4.2323 LearningRate 0.0191 Epoch: 17 Global Step: 87340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:50,926-Speed 3435.20 samples/sec Loss 4.2011 LearningRate 0.0191 Epoch: 17 Global Step: 87350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:53,905-Speed 3438.26 samples/sec Loss 4.0945 LearningRate 0.0191 Epoch: 17 Global Step: 87360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:56,887-Speed 3434.64 samples/sec Loss 4.2333 LearningRate 0.0191 Epoch: 17 Global Step: 87370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:56:59,875-Speed 3428.85 samples/sec Loss 4.2189 LearningRate 0.0191 Epoch: 17 Global Step: 87380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:57:02,917-Speed 3366.35 samples/sec Loss 4.2516 LearningRate 0.0191 Epoch: 17 Global Step: 87390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:57:05,959-Speed 3367.83 samples/sec Loss 4.0881 LearningRate 0.0191 Epoch: 17 Global Step: 87400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:57:08,956-Speed 3417.53 samples/sec Loss 4.0996 LearningRate 0.0191 Epoch: 17 Global Step: 87410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:57:11,985-Speed 3381.67 samples/sec Loss 4.2789 LearningRate 0.0191 Epoch: 17 Global Step: 87420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:57:15,038-Speed 3355.76 samples/sec Loss 4.2077 LearningRate 0.0190 Epoch: 17 Global Step: 87430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 01:57:18,069-Speed 3379.48 samples/sec Loss 4.1924 LearningRate 0.0190 Epoch: 17 Global Step: 87440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:21,051-Speed 3434.55 samples/sec Loss 4.1135 LearningRate 0.0190 Epoch: 17 Global Step: 87450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:24,039-Speed 3428.04 samples/sec Loss 4.2883 LearningRate 0.0190 Epoch: 17 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:27,051-Speed 3400.47 samples/sec Loss 4.2195 LearningRate 0.0190 Epoch: 17 Global Step: 87470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:30,039-Speed 3429.31 samples/sec Loss 4.3570 LearningRate 0.0190 Epoch: 17 Global Step: 87480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:33,036-Speed 3417.10 samples/sec Loss 4.2680 LearningRate 0.0190 Epoch: 17 Global Step: 87490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:36,022-Speed 3431.08 samples/sec Loss 4.1232 LearningRate 0.0190 Epoch: 17 Global Step: 87500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:39,011-Speed 3426.08 samples/sec Loss 4.1367 LearningRate 0.0190 Epoch: 17 Global Step: 87510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:42,018-Speed 3406.02 samples/sec Loss 4.2895 LearningRate 0.0190 Epoch: 17 Global Step: 87520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:45,006-Speed 3428.52 samples/sec Loss 4.2622 LearningRate 0.0189 Epoch: 17 Global Step: 87530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:48,011-Speed 3408.81 samples/sec Loss 4.3006 LearningRate 0.0189 Epoch: 17 Global Step: 87540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:50,998-Speed 3429.33 samples/sec Loss 4.3276 LearningRate 0.0189 Epoch: 17 Global Step: 87550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:53,980-Speed 3434.72 samples/sec Loss 4.1637 LearningRate 0.0189 Epoch: 17 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:57:57,027-Speed 3361.01 samples/sec Loss 4.2018 LearningRate 0.0189 Epoch: 17 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:00,056-Speed 3381.57 samples/sec Loss 4.2312 LearningRate 0.0189 Epoch: 17 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:03,044-Speed 3428.44 samples/sec Loss 4.3998 LearningRate 0.0189 Epoch: 17 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:06,028-Speed 3432.49 samples/sec Loss 4.2202 LearningRate 0.0189 Epoch: 17 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:09,021-Speed 3421.61 samples/sec Loss 4.2154 LearningRate 0.0189 Epoch: 17 Global Step: 87610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:12,002-Speed 3436.22 samples/sec Loss 4.3144 LearningRate 0.0189 Epoch: 17 Global Step: 87620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:14,994-Speed 3423.38 samples/sec Loss 4.2115 LearningRate 0.0188 Epoch: 17 Global Step: 87630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:17,988-Speed 3421.92 samples/sec Loss 4.2912 LearningRate 0.0188 Epoch: 17 Global Step: 87640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:58:21,008-Speed 3439.48 samples/sec Loss 4.2598 LearningRate 0.0188 Epoch: 17 Global Step: 87650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:23,991-Speed 3433.14 samples/sec Loss 4.2000 LearningRate 0.0188 Epoch: 17 Global Step: 87660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:26,980-Speed 3425.88 samples/sec Loss 4.2876 LearningRate 0.0188 Epoch: 17 Global Step: 87670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:30,030-Speed 3405.37 samples/sec Loss 4.2087 LearningRate 0.0188 Epoch: 17 Global Step: 87680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:33,030-Speed 3413.53 samples/sec Loss 4.3837 LearningRate 0.0188 Epoch: 17 Global Step: 87690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:36,071-Speed 3428.07 samples/sec Loss 4.2368 LearningRate 0.0188 Epoch: 17 Global Step: 87700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:39,052-Speed 3435.63 samples/sec Loss 4.2821 LearningRate 0.0188 Epoch: 17 Global Step: 87710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:42,139-Speed 3424.07 samples/sec Loss 4.2557 LearningRate 0.0188 Epoch: 17 Global Step: 87720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:45,502-Speed 3343.62 samples/sec Loss 4.3673 LearningRate 0.0188 Epoch: 17 Global Step: 87730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:48,502-Speed 3413.24 samples/sec Loss 4.1764 LearningRate 0.0187 Epoch: 17 Global Step: 87740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:51,537-Speed 3375.63 samples/sec Loss 4.4083 LearningRate 0.0187 Epoch: 17 Global Step: 87750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:58:54,539-Speed 3411.68 samples/sec Loss 4.2799 LearningRate 0.0187 Epoch: 17 Global Step: 87760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:58:57,523-Speed 3432.51 samples/sec Loss 4.0896 LearningRate 0.0187 Epoch: 17 Global Step: 87770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:00,513-Speed 3426.01 samples/sec Loss 4.2435 LearningRate 0.0187 Epoch: 17 Global Step: 87780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:03,497-Speed 3432.06 samples/sec Loss 4.1984 LearningRate 0.0187 Epoch: 17 Global Step: 87790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:06,481-Speed 3432.93 samples/sec Loss 4.1447 LearningRate 0.0187 Epoch: 17 Global Step: 87800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:09,481-Speed 3414.32 samples/sec Loss 4.2063 LearningRate 0.0187 Epoch: 17 Global Step: 87810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:12,469-Speed 3427.77 samples/sec Loss 4.1975 LearningRate 0.0187 Epoch: 17 Global Step: 87820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:15,465-Speed 3418.42 samples/sec Loss 4.1062 LearningRate 0.0187 Epoch: 17 Global Step: 87830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:18,457-Speed 3424.05 samples/sec Loss 4.2428 LearningRate 0.0186 Epoch: 17 Global Step: 87840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:21,481-Speed 3386.36 samples/sec Loss 4.2791 LearningRate 0.0186 Epoch: 17 Global Step: 87850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:24,447-Speed 3453.17 samples/sec Loss 4.2664 LearningRate 0.0186 Epoch: 17 Global Step: 87860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:27,454-Speed 3406.33 samples/sec Loss 4.2385 LearningRate 0.0186 Epoch: 17 Global Step: 87870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:30,482-Speed 3382.95 samples/sec Loss 4.3101 LearningRate 0.0186 Epoch: 17 Global Step: 87880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:33,486-Speed 3410.10 samples/sec Loss 4.2876 LearningRate 0.0186 Epoch: 17 Global Step: 87890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:36,484-Speed 3416.69 samples/sec Loss 4.2453 LearningRate 0.0186 Epoch: 17 Global Step: 87900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:39,469-Speed 3431.04 samples/sec Loss 4.2752 LearningRate 0.0186 Epoch: 17 Global Step: 87910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:42,468-Speed 3415.20 samples/sec Loss 4.1192 LearningRate 0.0186 Epoch: 17 Global Step: 87920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:45,484-Speed 3396.51 samples/sec Loss 4.2603 LearningRate 0.0186 Epoch: 17 Global Step: 87930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:48,476-Speed 3422.71 samples/sec Loss 4.1610 LearningRate 0.0185 Epoch: 17 Global Step: 87940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:51,461-Speed 3431.53 samples/sec Loss 4.2509 LearningRate 0.0185 Epoch: 17 Global Step: 87950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 01:59:54,448-Speed 3429.07 samples/sec Loss 4.2079 LearningRate 0.0185 Epoch: 17 Global Step: 87960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 01:59:57,462-Speed 3397.86 samples/sec Loss 4.2051 LearningRate 0.0185 Epoch: 17 Global Step: 87970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 02:00:00,509-Speed 3362.02 samples/sec Loss 4.2514 LearningRate 0.0185 Epoch: 17 Global Step: 87980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:00:03,497-Speed 3427.95 samples/sec Loss 4.2354 LearningRate 0.0185 Epoch: 17 Global Step: 87990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:00:06,489-Speed 3424.90 samples/sec Loss 4.1935 LearningRate 0.0185 Epoch: 17 Global Step: 88000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:00:49,563-[lfw][88000]XNorm: 21.913748 Training: 2022-01-20 02:00:49,564-[lfw][88000]Accuracy-Flip: 0.99783+-0.00211 Training: 2022-01-20 02:00:49,564-[lfw][88000]Accuracy-Highest: 0.99817 Training: 2022-01-20 02:01:39,646-[cfp_fp][88000]XNorm: 20.337992 Training: 2022-01-20 02:01:39,647-[cfp_fp][88000]Accuracy-Flip: 0.97900+-0.00816 Training: 2022-01-20 02:01:39,647-[cfp_fp][88000]Accuracy-Highest: 0.98186 Training: 2022-01-20 02:02:22,757-[agedb_30][88000]XNorm: 21.816966 Training: 2022-01-20 02:02:22,758-[agedb_30][88000]Accuracy-Flip: 0.97967+-0.00788 Training: 2022-01-20 02:02:22,759-[agedb_30][88000]Accuracy-Highest: 0.98233 Training: 2022-01-20 02:02:25,755-Speed 73.53 samples/sec Loss 4.2275 LearningRate 0.0185 Epoch: 17 Global Step: 88010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:02:28,754-Speed 3415.96 samples/sec Loss 4.2683 LearningRate 0.0185 Epoch: 17 Global Step: 88020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:02:31,737-Speed 3433.75 samples/sec Loss 4.2245 LearningRate 0.0185 Epoch: 17 Global Step: 88030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:02:34,843-Speed 3299.45 samples/sec Loss 4.2859 LearningRate 0.0185 Epoch: 17 Global Step: 88040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:02:37,816-Speed 3444.71 samples/sec Loss 4.2176 LearningRate 0.0184 Epoch: 17 Global Step: 88050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:02:40,796-Speed 3437.26 samples/sec Loss 4.2230 LearningRate 0.0184 Epoch: 17 Global Step: 88060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:02:43,811-Speed 3397.84 samples/sec Loss 4.2323 LearningRate 0.0184 Epoch: 17 Global Step: 88070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:02:46,799-Speed 3427.16 samples/sec Loss 4.2549 LearningRate 0.0184 Epoch: 17 Global Step: 88080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:02:49,779-Speed 3437.66 samples/sec Loss 4.3032 LearningRate 0.0184 Epoch: 17 Global Step: 88090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:02:52,769-Speed 3425.49 samples/sec Loss 4.2577 LearningRate 0.0184 Epoch: 17 Global Step: 88100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:02:55,749-Speed 3437.61 samples/sec Loss 4.1385 LearningRate 0.0184 Epoch: 17 Global Step: 88110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:02:58,727-Speed 3438.54 samples/sec Loss 4.2442 LearningRate 0.0184 Epoch: 17 Global Step: 88120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:03:01,716-Speed 3426.70 samples/sec Loss 4.2991 LearningRate 0.0184 Epoch: 17 Global Step: 88130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:04,700-Speed 3433.62 samples/sec Loss 4.1867 LearningRate 0.0184 Epoch: 17 Global Step: 88140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:07,685-Speed 3430.69 samples/sec Loss 4.1169 LearningRate 0.0183 Epoch: 17 Global Step: 88150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:10,666-Speed 3437.42 samples/sec Loss 4.0999 LearningRate 0.0183 Epoch: 17 Global Step: 88160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:13,653-Speed 3429.24 samples/sec Loss 4.1976 LearningRate 0.0183 Epoch: 17 Global Step: 88170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:16,738-Speed 3319.54 samples/sec Loss 4.2764 LearningRate 0.0183 Epoch: 17 Global Step: 88180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:19,737-Speed 3415.50 samples/sec Loss 4.2328 LearningRate 0.0183 Epoch: 17 Global Step: 88190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:22,735-Speed 3416.71 samples/sec Loss 4.1568 LearningRate 0.0183 Epoch: 17 Global Step: 88200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:25,832-Speed 3306.71 samples/sec Loss 4.0613 LearningRate 0.0183 Epoch: 17 Global Step: 88210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:28,845-Speed 3400.52 samples/sec Loss 4.3535 LearningRate 0.0183 Epoch: 17 Global Step: 88220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:31,817-Speed 3446.32 samples/sec Loss 4.1351 LearningRate 0.0183 Epoch: 17 Global Step: 88230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:34,820-Speed 3411.16 samples/sec Loss 4.3191 LearningRate 0.0183 Epoch: 17 Global Step: 88240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:37,816-Speed 3418.72 samples/sec Loss 4.1649 LearningRate 0.0183 Epoch: 17 Global Step: 88250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:40,813-Speed 3417.00 samples/sec Loss 4.4545 LearningRate 0.0182 Epoch: 17 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:43,938-Speed 3277.91 samples/sec Loss 4.2217 LearningRate 0.0182 Epoch: 17 Global Step: 88270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:46,929-Speed 3424.46 samples/sec Loss 4.1655 LearningRate 0.0182 Epoch: 17 Global Step: 88280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:49,930-Speed 3412.93 samples/sec Loss 4.2808 LearningRate 0.0182 Epoch: 17 Global Step: 88290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:03:52,917-Speed 3428.59 samples/sec Loss 4.1872 LearningRate 0.0182 Epoch: 17 Global Step: 88300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:03:55,906-Speed 3427.52 samples/sec Loss 4.2390 LearningRate 0.0182 Epoch: 17 Global Step: 88310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:03:58,900-Speed 3420.95 samples/sec Loss 4.1825 LearningRate 0.0182 Epoch: 17 Global Step: 88320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:01,943-Speed 3365.79 samples/sec Loss 4.2154 LearningRate 0.0182 Epoch: 17 Global Step: 88330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:04,927-Speed 3433.44 samples/sec Loss 4.1680 LearningRate 0.0182 Epoch: 17 Global Step: 88340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:07,906-Speed 3437.27 samples/sec Loss 4.2209 LearningRate 0.0182 Epoch: 17 Global Step: 88350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:10,887-Speed 3437.26 samples/sec Loss 4.2649 LearningRate 0.0181 Epoch: 17 Global Step: 88360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:13,878-Speed 3423.84 samples/sec Loss 4.1231 LearningRate 0.0181 Epoch: 17 Global Step: 88370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:16,952-Speed 3332.68 samples/sec Loss 4.2481 LearningRate 0.0181 Epoch: 17 Global Step: 88380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:19,988-Speed 3373.88 samples/sec Loss 4.2307 LearningRate 0.0181 Epoch: 17 Global Step: 88390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:23,045-Speed 3350.23 samples/sec Loss 4.2379 LearningRate 0.0181 Epoch: 17 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:04:26,033-Speed 3427.74 samples/sec Loss 4.2670 LearningRate 0.0181 Epoch: 17 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:04:29,013-Speed 3437.98 samples/sec Loss 4.3318 LearningRate 0.0181 Epoch: 17 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:04:31,992-Speed 3438.38 samples/sec Loss 4.1867 LearningRate 0.0181 Epoch: 17 Global Step: 88430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:04:35,000-Speed 3405.29 samples/sec Loss 4.2453 LearningRate 0.0181 Epoch: 17 Global Step: 88440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:37,977-Speed 3439.46 samples/sec Loss 4.2670 LearningRate 0.0181 Epoch: 17 Global Step: 88450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:40,965-Speed 3428.38 samples/sec Loss 4.2880 LearningRate 0.0181 Epoch: 17 Global Step: 88460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:43,960-Speed 3419.88 samples/sec Loss 4.2903 LearningRate 0.0180 Epoch: 17 Global Step: 88470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:46,950-Speed 3425.01 samples/sec Loss 4.3497 LearningRate 0.0180 Epoch: 17 Global Step: 88480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:49,937-Speed 3429.73 samples/sec Loss 4.2028 LearningRate 0.0180 Epoch: 17 Global Step: 88490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:52,920-Speed 3433.98 samples/sec Loss 4.2966 LearningRate 0.0180 Epoch: 17 Global Step: 88500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:55,896-Speed 3442.12 samples/sec Loss 4.1952 LearningRate 0.0180 Epoch: 17 Global Step: 88510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:04:58,920-Speed 3386.96 samples/sec Loss 4.1831 LearningRate 0.0180 Epoch: 17 Global Step: 88520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:01,903-Speed 3433.12 samples/sec Loss 4.2550 LearningRate 0.0180 Epoch: 17 Global Step: 88530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:04,885-Speed 3435.49 samples/sec Loss 4.1086 LearningRate 0.0180 Epoch: 17 Global Step: 88540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:05:07,890-Speed 3407.46 samples/sec Loss 4.2964 LearningRate 0.0180 Epoch: 17 Global Step: 88550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:05:10,887-Speed 3418.85 samples/sec Loss 4.1740 LearningRate 0.0180 Epoch: 17 Global Step: 88560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:05:13,914-Speed 3383.41 samples/sec Loss 4.3352 LearningRate 0.0179 Epoch: 17 Global Step: 88570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:05:16,964-Speed 3358.48 samples/sec Loss 4.1610 LearningRate 0.0179 Epoch: 17 Global Step: 88580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:05:19,950-Speed 3430.55 samples/sec Loss 4.1871 LearningRate 0.0179 Epoch: 17 Global Step: 88590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:05:22,940-Speed 3425.29 samples/sec Loss 4.1740 LearningRate 0.0179 Epoch: 17 Global Step: 88600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:25,943-Speed 3410.87 samples/sec Loss 4.2002 LearningRate 0.0179 Epoch: 17 Global Step: 88610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:28,923-Speed 3437.00 samples/sec Loss 4.2809 LearningRate 0.0179 Epoch: 17 Global Step: 88620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:31,901-Speed 3439.67 samples/sec Loss 4.3319 LearningRate 0.0179 Epoch: 17 Global Step: 88630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:34,986-Speed 3319.92 samples/sec Loss 4.2237 LearningRate 0.0179 Epoch: 17 Global Step: 88640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:37,966-Speed 3436.99 samples/sec Loss 4.3020 LearningRate 0.0179 Epoch: 17 Global Step: 88650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:40,980-Speed 3398.92 samples/sec Loss 4.1017 LearningRate 0.0179 Epoch: 17 Global Step: 88660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:44,017-Speed 3372.91 samples/sec Loss 4.1544 LearningRate 0.0179 Epoch: 17 Global Step: 88670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:46,999-Speed 3434.92 samples/sec Loss 4.1653 LearningRate 0.0178 Epoch: 17 Global Step: 88680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:49,980-Speed 3436.59 samples/sec Loss 4.2098 LearningRate 0.0178 Epoch: 17 Global Step: 88690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:05:52,975-Speed 3419.36 samples/sec Loss 4.0540 LearningRate 0.0178 Epoch: 17 Global Step: 88700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:05:55,967-Speed 3423.59 samples/sec Loss 4.2968 LearningRate 0.0178 Epoch: 17 Global Step: 88710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:05:58,958-Speed 3423.91 samples/sec Loss 4.1574 LearningRate 0.0178 Epoch: 17 Global Step: 88720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:06:01,945-Speed 3429.71 samples/sec Loss 4.2636 LearningRate 0.0178 Epoch: 17 Global Step: 88730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:06:04,936-Speed 3424.51 samples/sec Loss 4.3262 LearningRate 0.0178 Epoch: 17 Global Step: 88740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:06:07,978-Speed 3366.34 samples/sec Loss 4.2186 LearningRate 0.0178 Epoch: 17 Global Step: 88750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:06:11,052-Speed 3332.95 samples/sec Loss 4.1977 LearningRate 0.0178 Epoch: 17 Global Step: 88760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:06:14,082-Speed 3380.03 samples/sec Loss 4.2753 LearningRate 0.0178 Epoch: 17 Global Step: 88770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:06:17,091-Speed 3404.38 samples/sec Loss 4.0852 LearningRate 0.0177 Epoch: 17 Global Step: 88780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:06:20,073-Speed 3435.22 samples/sec Loss 4.2101 LearningRate 0.0177 Epoch: 17 Global Step: 88790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:06:23,025-Speed 3469.75 samples/sec Loss 4.2980 LearningRate 0.0177 Epoch: 17 Global Step: 88800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:06:26,010-Speed 3431.28 samples/sec Loss 4.2548 LearningRate 0.0177 Epoch: 17 Global Step: 88810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:06:29,115-Speed 3298.90 samples/sec Loss 4.2199 LearningRate 0.0177 Epoch: 17 Global Step: 88820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:06:32,147-Speed 3378.26 samples/sec Loss 4.2200 LearningRate 0.0177 Epoch: 17 Global Step: 88830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:06:35,198-Speed 3357.37 samples/sec Loss 4.3534 LearningRate 0.0177 Epoch: 17 Global Step: 88840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:06:38,188-Speed 3425.24 samples/sec Loss 4.1579 LearningRate 0.0177 Epoch: 17 Global Step: 88850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:06:41,223-Speed 3374.38 samples/sec Loss 4.1271 LearningRate 0.0177 Epoch: 17 Global Step: 88860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:06:44,216-Speed 3422.67 samples/sec Loss 4.3479 LearningRate 0.0177 Epoch: 17 Global Step: 88870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:06:47,197-Speed 3435.85 samples/sec Loss 4.2927 LearningRate 0.0177 Epoch: 17 Global Step: 88880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:06:50,177-Speed 3438.26 samples/sec Loss 4.1330 LearningRate 0.0176 Epoch: 17 Global Step: 88890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:06:53,260-Speed 3322.44 samples/sec Loss 4.0890 LearningRate 0.0176 Epoch: 17 Global Step: 88900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:06:56,245-Speed 3430.70 samples/sec Loss 4.1965 LearningRate 0.0176 Epoch: 17 Global Step: 88910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:06:59,298-Speed 3355.35 samples/sec Loss 4.2959 LearningRate 0.0176 Epoch: 17 Global Step: 88920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:07:02,368-Speed 3336.61 samples/sec Loss 4.2459 LearningRate 0.0176 Epoch: 17 Global Step: 88930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:07:05,379-Speed 3401.78 samples/sec Loss 4.2302 LearningRate 0.0176 Epoch: 17 Global Step: 88940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:07:08,376-Speed 3417.90 samples/sec Loss 4.2953 LearningRate 0.0176 Epoch: 17 Global Step: 88950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:07:11,370-Speed 3421.79 samples/sec Loss 4.2060 LearningRate 0.0176 Epoch: 17 Global Step: 88960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:14,378-Speed 3404.62 samples/sec Loss 4.2575 LearningRate 0.0176 Epoch: 17 Global Step: 88970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:17,418-Speed 3368.87 samples/sec Loss 4.3905 LearningRate 0.0176 Epoch: 17 Global Step: 88980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:20,502-Speed 3321.57 samples/sec Loss 4.2685 LearningRate 0.0176 Epoch: 17 Global Step: 88990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:23,587-Speed 3320.60 samples/sec Loss 4.0595 LearningRate 0.0175 Epoch: 17 Global Step: 89000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:26,565-Speed 3439.04 samples/sec Loss 4.2343 LearningRate 0.0175 Epoch: 17 Global Step: 89010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:29,625-Speed 3348.10 samples/sec Loss 4.2992 LearningRate 0.0175 Epoch: 17 Global Step: 89020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:32,688-Speed 3343.73 samples/sec Loss 4.2294 LearningRate 0.0175 Epoch: 17 Global Step: 89030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:35,683-Speed 3420.62 samples/sec Loss 4.3364 LearningRate 0.0175 Epoch: 17 Global Step: 89040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:38,713-Speed 3380.28 samples/sec Loss 4.2206 LearningRate 0.0175 Epoch: 17 Global Step: 89050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:41,703-Speed 3425.26 samples/sec Loss 4.1701 LearningRate 0.0175 Epoch: 17 Global Step: 89060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:07:44,689-Speed 3430.19 samples/sec Loss 4.1559 LearningRate 0.0175 Epoch: 17 Global Step: 89070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:07:47,685-Speed 3419.05 samples/sec Loss 4.3155 LearningRate 0.0175 Epoch: 17 Global Step: 89080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:50,732-Speed 3362.25 samples/sec Loss 4.3518 LearningRate 0.0175 Epoch: 17 Global Step: 89090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:53,729-Speed 3416.79 samples/sec Loss 4.2691 LearningRate 0.0174 Epoch: 17 Global Step: 89100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:56,715-Speed 3430.85 samples/sec Loss 4.2240 LearningRate 0.0174 Epoch: 17 Global Step: 89110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:07:59,816-Speed 3303.22 samples/sec Loss 4.2250 LearningRate 0.0174 Epoch: 17 Global Step: 89120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:02,829-Speed 3399.21 samples/sec Loss 4.1976 LearningRate 0.0174 Epoch: 17 Global Step: 89130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:05,834-Speed 3408.60 samples/sec Loss 4.2613 LearningRate 0.0174 Epoch: 17 Global Step: 89140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:08,825-Speed 3424.51 samples/sec Loss 3.9861 LearningRate 0.0174 Epoch: 17 Global Step: 89150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:11,857-Speed 3378.59 samples/sec Loss 4.1560 LearningRate 0.0174 Epoch: 17 Global Step: 89160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:14,856-Speed 3415.22 samples/sec Loss 4.2219 LearningRate 0.0174 Epoch: 17 Global Step: 89170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:17,840-Speed 3432.39 samples/sec Loss 4.3658 LearningRate 0.0174 Epoch: 17 Global Step: 89180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:08:20,807-Speed 3453.40 samples/sec Loss 4.2909 LearningRate 0.0174 Epoch: 17 Global Step: 89190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:23,796-Speed 3426.23 samples/sec Loss 4.1977 LearningRate 0.0174 Epoch: 17 Global Step: 89200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:26,778-Speed 3434.31 samples/sec Loss 4.2014 LearningRate 0.0173 Epoch: 17 Global Step: 89210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:29,765-Speed 3430.08 samples/sec Loss 4.2762 LearningRate 0.0173 Epoch: 17 Global Step: 89220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:32,752-Speed 3428.34 samples/sec Loss 4.1391 LearningRate 0.0173 Epoch: 17 Global Step: 89230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:35,733-Speed 3436.25 samples/sec Loss 4.0703 LearningRate 0.0173 Epoch: 17 Global Step: 89240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:38,723-Speed 3425.64 samples/sec Loss 4.2865 LearningRate 0.0173 Epoch: 17 Global Step: 89250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:41,822-Speed 3304.96 samples/sec Loss 4.2230 LearningRate 0.0173 Epoch: 17 Global Step: 89260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:44,833-Speed 3401.81 samples/sec Loss 4.2659 LearningRate 0.0173 Epoch: 17 Global Step: 89270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:47,875-Speed 3367.04 samples/sec Loss 4.2981 LearningRate 0.0173 Epoch: 17 Global Step: 89280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:08:50,859-Speed 3432.91 samples/sec Loss 4.0648 LearningRate 0.0173 Epoch: 17 Global Step: 89290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:08:53,845-Speed 3431.07 samples/sec Loss 4.1687 LearningRate 0.0173 Epoch: 17 Global Step: 89300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:08:56,827-Speed 3434.96 samples/sec Loss 4.0796 LearningRate 0.0173 Epoch: 17 Global Step: 89310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:08:59,826-Speed 3414.73 samples/sec Loss 4.3253 LearningRate 0.0172 Epoch: 17 Global Step: 89320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:02,812-Speed 3430.38 samples/sec Loss 4.2293 LearningRate 0.0172 Epoch: 17 Global Step: 89330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:05,799-Speed 3429.25 samples/sec Loss 4.1545 LearningRate 0.0172 Epoch: 17 Global Step: 89340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:08,854-Speed 3352.84 samples/sec Loss 4.2733 LearningRate 0.0172 Epoch: 17 Global Step: 89350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:11,832-Speed 3439.69 samples/sec Loss 4.1325 LearningRate 0.0172 Epoch: 17 Global Step: 89360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:14,826-Speed 3420.56 samples/sec Loss 4.1852 LearningRate 0.0172 Epoch: 17 Global Step: 89370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:17,813-Speed 3429.48 samples/sec Loss 4.3688 LearningRate 0.0172 Epoch: 17 Global Step: 89380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:20,800-Speed 3429.97 samples/sec Loss 4.0970 LearningRate 0.0172 Epoch: 17 Global Step: 89390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 02:09:23,765-Speed 3453.77 samples/sec Loss 4.2049 LearningRate 0.0172 Epoch: 17 Global Step: 89400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:26,747-Speed 3434.77 samples/sec Loss 4.2228 LearningRate 0.0172 Epoch: 17 Global Step: 89410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:29,736-Speed 3426.84 samples/sec Loss 4.0588 LearningRate 0.0172 Epoch: 17 Global Step: 89420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:32,726-Speed 3425.52 samples/sec Loss 4.1575 LearningRate 0.0171 Epoch: 17 Global Step: 89430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:35,758-Speed 3378.65 samples/sec Loss 4.0711 LearningRate 0.0171 Epoch: 17 Global Step: 89440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:38,755-Speed 3418.02 samples/sec Loss 4.1508 LearningRate 0.0171 Epoch: 17 Global Step: 89450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:41,741-Speed 3430.04 samples/sec Loss 4.2840 LearningRate 0.0171 Epoch: 17 Global Step: 89460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:44,729-Speed 3427.78 samples/sec Loss 4.2526 LearningRate 0.0171 Epoch: 17 Global Step: 89470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:09:47,698-Speed 3450.08 samples/sec Loss 4.1826 LearningRate 0.0171 Epoch: 17 Global Step: 89480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:09:50,693-Speed 3419.86 samples/sec Loss 4.2519 LearningRate 0.0171 Epoch: 17 Global Step: 89490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:09:53,675-Speed 3435.40 samples/sec Loss 4.1619 LearningRate 0.0171 Epoch: 17 Global Step: 89500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:09:56,655-Speed 3436.07 samples/sec Loss 4.2038 LearningRate 0.0171 Epoch: 17 Global Step: 89510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:09:59,656-Speed 3413.88 samples/sec Loss 4.2061 LearningRate 0.0171 Epoch: 17 Global Step: 89520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:02,638-Speed 3434.93 samples/sec Loss 4.2296 LearningRate 0.0170 Epoch: 17 Global Step: 89530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:05,627-Speed 3426.61 samples/sec Loss 4.1787 LearningRate 0.0170 Epoch: 17 Global Step: 89540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:08,630-Speed 3411.03 samples/sec Loss 4.0859 LearningRate 0.0170 Epoch: 17 Global Step: 89550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:11,629-Speed 3414.68 samples/sec Loss 4.1961 LearningRate 0.0170 Epoch: 17 Global Step: 89560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:14,614-Speed 3432.59 samples/sec Loss 4.1690 LearningRate 0.0170 Epoch: 17 Global Step: 89570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:17,582-Speed 3450.38 samples/sec Loss 4.2401 LearningRate 0.0170 Epoch: 17 Global Step: 89580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:20,573-Speed 3424.80 samples/sec Loss 4.2311 LearningRate 0.0170 Epoch: 17 Global Step: 89590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:23,562-Speed 3426.44 samples/sec Loss 4.1017 LearningRate 0.0170 Epoch: 17 Global Step: 89600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:26,558-Speed 3418.45 samples/sec Loss 4.1971 LearningRate 0.0170 Epoch: 17 Global Step: 89610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:29,556-Speed 3417.29 samples/sec Loss 4.1694 LearningRate 0.0170 Epoch: 17 Global Step: 89620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:32,542-Speed 3430.47 samples/sec Loss 4.1501 LearningRate 0.0170 Epoch: 17 Global Step: 89630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:35,650-Speed 3295.17 samples/sec Loss 4.1408 LearningRate 0.0169 Epoch: 17 Global Step: 89640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:38,644-Speed 3421.17 samples/sec Loss 4.1834 LearningRate 0.0169 Epoch: 17 Global Step: 89650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:41,631-Speed 3428.72 samples/sec Loss 4.0552 LearningRate 0.0169 Epoch: 17 Global Step: 89660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:44,617-Speed 3431.57 samples/sec Loss 4.2659 LearningRate 0.0169 Epoch: 17 Global Step: 89670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:10:47,603-Speed 3429.99 samples/sec Loss 4.2148 LearningRate 0.0169 Epoch: 17 Global Step: 89680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:10:50,622-Speed 3393.12 samples/sec Loss 4.2650 LearningRate 0.0169 Epoch: 17 Global Step: 89690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:10:53,603-Speed 3435.93 samples/sec Loss 4.2490 LearningRate 0.0169 Epoch: 17 Global Step: 89700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:10:56,585-Speed 3434.23 samples/sec Loss 4.1820 LearningRate 0.0169 Epoch: 17 Global Step: 89710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:10:59,567-Speed 3434.61 samples/sec Loss 4.1785 LearningRate 0.0169 Epoch: 17 Global Step: 89720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:11:02,552-Speed 3431.28 samples/sec Loss 4.2532 LearningRate 0.0169 Epoch: 17 Global Step: 89730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:11:05,558-Speed 3407.75 samples/sec Loss 4.0849 LearningRate 0.0169 Epoch: 17 Global Step: 89740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:11:08,581-Speed 3388.11 samples/sec Loss 4.0801 LearningRate 0.0168 Epoch: 17 Global Step: 89750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:11:11,586-Speed 3409.14 samples/sec Loss 4.2592 LearningRate 0.0168 Epoch: 17 Global Step: 89760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:11:14,636-Speed 3357.90 samples/sec Loss 4.0525 LearningRate 0.0168 Epoch: 17 Global Step: 89770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:11:17,629-Speed 3421.92 samples/sec Loss 4.2716 LearningRate 0.0168 Epoch: 17 Global Step: 89780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 02:11:20,610-Speed 3437.29 samples/sec Loss 4.1626 LearningRate 0.0168 Epoch: 17 Global Step: 89790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:11:23,603-Speed 3421.46 samples/sec Loss 4.1358 LearningRate 0.0168 Epoch: 17 Global Step: 89800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:11:26,571-Speed 3451.82 samples/sec Loss 4.1980 LearningRate 0.0168 Epoch: 17 Global Step: 89810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:11:29,570-Speed 3415.50 samples/sec Loss 4.1333 LearningRate 0.0168 Epoch: 17 Global Step: 89820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:11:32,554-Speed 3432.86 samples/sec Loss 4.2712 LearningRate 0.0168 Epoch: 17 Global Step: 89830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:11:35,542-Speed 3427.23 samples/sec Loss 4.2276 LearningRate 0.0168 Epoch: 17 Global Step: 89840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:11:38,529-Speed 3430.38 samples/sec Loss 4.1737 LearningRate 0.0168 Epoch: 17 Global Step: 89850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:11:41,530-Speed 3413.12 samples/sec Loss 4.2234 LearningRate 0.0167 Epoch: 17 Global Step: 89860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:11:44,519-Speed 3426.63 samples/sec Loss 4.1121 LearningRate 0.0167 Epoch: 17 Global Step: 89870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:11:47,507-Speed 3427.45 samples/sec Loss 4.1142 LearningRate 0.0167 Epoch: 17 Global Step: 89880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:11:50,569-Speed 3345.53 samples/sec Loss 4.1319 LearningRate 0.0167 Epoch: 17 Global Step: 89890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:11:53,576-Speed 3405.74 samples/sec Loss 4.2068 LearningRate 0.0167 Epoch: 17 Global Step: 89900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:11:56,564-Speed 3428.35 samples/sec Loss 4.3323 LearningRate 0.0167 Epoch: 17 Global Step: 89910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:11:59,555-Speed 3425.05 samples/sec Loss 4.2729 LearningRate 0.0167 Epoch: 17 Global Step: 89920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:12:02,543-Speed 3427.42 samples/sec Loss 4.1337 LearningRate 0.0167 Epoch: 17 Global Step: 89930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:12:05,525-Speed 3435.40 samples/sec Loss 4.2809 LearningRate 0.0167 Epoch: 17 Global Step: 89940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:12:08,544-Speed 3392.60 samples/sec Loss 4.1227 LearningRate 0.0167 Epoch: 17 Global Step: 89950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:12:11,513-Speed 3449.69 samples/sec Loss 4.2387 LearningRate 0.0167 Epoch: 17 Global Step: 89960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:12:14,501-Speed 3428.30 samples/sec Loss 4.1028 LearningRate 0.0166 Epoch: 17 Global Step: 89970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:12:17,513-Speed 3400.39 samples/sec Loss 4.0988 LearningRate 0.0166 Epoch: 17 Global Step: 89980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:12:20,619-Speed 3297.70 samples/sec Loss 4.2020 LearningRate 0.0166 Epoch: 17 Global Step: 89990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:12:23,614-Speed 3420.09 samples/sec Loss 4.2276 LearningRate 0.0166 Epoch: 17 Global Step: 90000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:13:06,609-[lfw][90000]XNorm: 21.997053 Training: 2022-01-20 02:13:06,610-[lfw][90000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-01-20 02:13:06,610-[lfw][90000]Accuracy-Highest: 0.99817 Training: 2022-01-20 02:13:56,565-[cfp_fp][90000]XNorm: 19.902932 Training: 2022-01-20 02:13:56,566-[cfp_fp][90000]Accuracy-Flip: 0.97800+-0.00677 Training: 2022-01-20 02:13:56,566-[cfp_fp][90000]Accuracy-Highest: 0.98186 Training: 2022-01-20 02:14:39,557-[agedb_30][90000]XNorm: 22.022524 Training: 2022-01-20 02:14:39,557-[agedb_30][90000]Accuracy-Flip: 0.98017+-0.00689 Training: 2022-01-20 02:14:39,558-[agedb_30][90000]Accuracy-Highest: 0.98233 Training: 2022-01-20 02:14:42,537-Speed 73.71 samples/sec Loss 4.2018 LearningRate 0.0166 Epoch: 17 Global Step: 90010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:14:45,534-Speed 3417.21 samples/sec Loss 4.2341 LearningRate 0.0166 Epoch: 17 Global Step: 90020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:14:48,593-Speed 3348.08 samples/sec Loss 4.0265 LearningRate 0.0166 Epoch: 17 Global Step: 90030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:14:51,615-Speed 3390.43 samples/sec Loss 4.1973 LearningRate 0.0166 Epoch: 17 Global Step: 90040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:14:54,602-Speed 3428.88 samples/sec Loss 4.1198 LearningRate 0.0166 Epoch: 17 Global Step: 90050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:14:57,663-Speed 3345.82 samples/sec Loss 4.1268 LearningRate 0.0166 Epoch: 17 Global Step: 90060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:00,813-Speed 3252.03 samples/sec Loss 4.1000 LearningRate 0.0166 Epoch: 17 Global Step: 90070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:03,860-Speed 3361.49 samples/sec Loss 4.1852 LearningRate 0.0165 Epoch: 17 Global Step: 90080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:06,861-Speed 3413.03 samples/sec Loss 4.3852 LearningRate 0.0165 Epoch: 17 Global Step: 90090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:09,896-Speed 3374.28 samples/sec Loss 4.1226 LearningRate 0.0165 Epoch: 17 Global Step: 90100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:12,923-Speed 3384.58 samples/sec Loss 4.2962 LearningRate 0.0165 Epoch: 17 Global Step: 90110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:15,903-Speed 3437.00 samples/sec Loss 4.0463 LearningRate 0.0165 Epoch: 17 Global Step: 90120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:18,907-Speed 3409.02 samples/sec Loss 4.2386 LearningRate 0.0165 Epoch: 17 Global Step: 90130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:21,894-Speed 3429.85 samples/sec Loss 4.1449 LearningRate 0.0165 Epoch: 17 Global Step: 90140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:24,895-Speed 3412.66 samples/sec Loss 4.2009 LearningRate 0.0165 Epoch: 17 Global Step: 90150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:27,859-Speed 3455.86 samples/sec Loss 4.1026 LearningRate 0.0165 Epoch: 17 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:30,859-Speed 3414.64 samples/sec Loss 4.0197 LearningRate 0.0165 Epoch: 17 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:33,850-Speed 3425.04 samples/sec Loss 4.1216 LearningRate 0.0165 Epoch: 17 Global Step: 90180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:36,859-Speed 3403.50 samples/sec Loss 4.2340 LearningRate 0.0164 Epoch: 17 Global Step: 90190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:39,862-Speed 3411.48 samples/sec Loss 4.1276 LearningRate 0.0164 Epoch: 17 Global Step: 90200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:42,881-Speed 3392.48 samples/sec Loss 4.1232 LearningRate 0.0164 Epoch: 17 Global Step: 90210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:45,931-Speed 3358.13 samples/sec Loss 4.2452 LearningRate 0.0164 Epoch: 17 Global Step: 90220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:48,956-Speed 3385.90 samples/sec Loss 4.1642 LearningRate 0.0164 Epoch: 17 Global Step: 90230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:51,943-Speed 3428.62 samples/sec Loss 4.1555 LearningRate 0.0164 Epoch: 17 Global Step: 90240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:54,962-Speed 3392.65 samples/sec Loss 4.1227 LearningRate 0.0164 Epoch: 17 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:15:57,952-Speed 3426.35 samples/sec Loss 4.1820 LearningRate 0.0164 Epoch: 17 Global Step: 90260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 02:16:00,954-Speed 3412.37 samples/sec Loss 4.1533 LearningRate 0.0164 Epoch: 17 Global Step: 90270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:03,957-Speed 3410.09 samples/sec Loss 4.1874 LearningRate 0.0164 Epoch: 17 Global Step: 90280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:06,939-Speed 3435.29 samples/sec Loss 4.1517 LearningRate 0.0164 Epoch: 17 Global Step: 90290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:09,922-Speed 3433.80 samples/sec Loss 4.0619 LearningRate 0.0163 Epoch: 17 Global Step: 90300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:12,947-Speed 3386.05 samples/sec Loss 4.2066 LearningRate 0.0163 Epoch: 17 Global Step: 90310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:15,987-Speed 3369.59 samples/sec Loss 4.1979 LearningRate 0.0163 Epoch: 17 Global Step: 90320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:18,980-Speed 3421.09 samples/sec Loss 4.0946 LearningRate 0.0163 Epoch: 17 Global Step: 90330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:21,977-Speed 3418.02 samples/sec Loss 4.1336 LearningRate 0.0163 Epoch: 17 Global Step: 90340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:24,958-Speed 3436.73 samples/sec Loss 4.1869 LearningRate 0.0163 Epoch: 17 Global Step: 90350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:27,942-Speed 3433.07 samples/sec Loss 4.1199 LearningRate 0.0163 Epoch: 17 Global Step: 90360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:30,921-Speed 3438.11 samples/sec Loss 4.1917 LearningRate 0.0163 Epoch: 17 Global Step: 90370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:16:33,927-Speed 3409.00 samples/sec Loss 4.2024 LearningRate 0.0163 Epoch: 17 Global Step: 90380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:16:36,906-Speed 3437.89 samples/sec Loss 4.1343 LearningRate 0.0163 Epoch: 17 Global Step: 90390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:16:39,889-Speed 3433.92 samples/sec Loss 4.1589 LearningRate 0.0163 Epoch: 17 Global Step: 90400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:16:42,878-Speed 3427.12 samples/sec Loss 4.1665 LearningRate 0.0162 Epoch: 17 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:16:45,839-Speed 3458.82 samples/sec Loss 4.1963 LearningRate 0.0162 Epoch: 17 Global Step: 90420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:48,817-Speed 3438.75 samples/sec Loss 4.2229 LearningRate 0.0162 Epoch: 17 Global Step: 90430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:51,904-Speed 3318.87 samples/sec Loss 4.0941 LearningRate 0.0162 Epoch: 17 Global Step: 90440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:54,906-Speed 3411.93 samples/sec Loss 4.2078 LearningRate 0.0162 Epoch: 17 Global Step: 90450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:16:57,891-Speed 3431.00 samples/sec Loss 4.2037 LearningRate 0.0162 Epoch: 17 Global Step: 90460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:00,886-Speed 3420.54 samples/sec Loss 4.2217 LearningRate 0.0162 Epoch: 17 Global Step: 90470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:03,905-Speed 3392.43 samples/sec Loss 4.1711 LearningRate 0.0162 Epoch: 17 Global Step: 90480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:06,958-Speed 3355.66 samples/sec Loss 4.2165 LearningRate 0.0162 Epoch: 17 Global Step: 90490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:09,937-Speed 3437.45 samples/sec Loss 4.3342 LearningRate 0.0162 Epoch: 17 Global Step: 90500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:12,925-Speed 3427.52 samples/sec Loss 4.0536 LearningRate 0.0162 Epoch: 17 Global Step: 90510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:15,916-Speed 3424.95 samples/sec Loss 4.2697 LearningRate 0.0161 Epoch: 17 Global Step: 90520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:17:18,906-Speed 3425.71 samples/sec Loss 4.0964 LearningRate 0.0161 Epoch: 17 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:17:21,907-Speed 3415.22 samples/sec Loss 4.1513 LearningRate 0.0161 Epoch: 17 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:17:24,905-Speed 3416.05 samples/sec Loss 4.2221 LearningRate 0.0161 Epoch: 17 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:17:27,894-Speed 3425.97 samples/sec Loss 4.1640 LearningRate 0.0161 Epoch: 17 Global Step: 90560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:17:30,895-Speed 3414.09 samples/sec Loss 4.1603 LearningRate 0.0161 Epoch: 17 Global Step: 90570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:17:33,925-Speed 3379.54 samples/sec Loss 4.0971 LearningRate 0.0161 Epoch: 17 Global Step: 90580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:37,070-Speed 3257.38 samples/sec Loss 4.1760 LearningRate 0.0161 Epoch: 17 Global Step: 90590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:40,054-Speed 3431.74 samples/sec Loss 4.2546 LearningRate 0.0161 Epoch: 17 Global Step: 90600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:43,068-Speed 3398.47 samples/sec Loss 4.1629 LearningRate 0.0161 Epoch: 17 Global Step: 90610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:46,052-Speed 3432.57 samples/sec Loss 4.2362 LearningRate 0.0161 Epoch: 17 Global Step: 90620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:49,160-Speed 3295.62 samples/sec Loss 4.0711 LearningRate 0.0160 Epoch: 17 Global Step: 90630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:52,173-Speed 3400.36 samples/sec Loss 4.1661 LearningRate 0.0160 Epoch: 17 Global Step: 90640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:55,151-Speed 3439.64 samples/sec Loss 4.0822 LearningRate 0.0160 Epoch: 17 Global Step: 90650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:17:58,134-Speed 3433.44 samples/sec Loss 4.0153 LearningRate 0.0160 Epoch: 17 Global Step: 90660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:01,118-Speed 3432.08 samples/sec Loss 4.1031 LearningRate 0.0160 Epoch: 17 Global Step: 90670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:04,103-Speed 3431.93 samples/sec Loss 4.2386 LearningRate 0.0160 Epoch: 17 Global Step: 90680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:18:07,090-Speed 3428.89 samples/sec Loss 4.2383 LearningRate 0.0160 Epoch: 17 Global Step: 90690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:18:10,086-Speed 3418.85 samples/sec Loss 4.1298 LearningRate 0.0160 Epoch: 17 Global Step: 90700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:18:13,107-Speed 3389.64 samples/sec Loss 4.0383 LearningRate 0.0160 Epoch: 17 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:18:16,107-Speed 3415.68 samples/sec Loss 4.1356 LearningRate 0.0160 Epoch: 17 Global Step: 90720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:18:19,160-Speed 3353.90 samples/sec Loss 4.1077 LearningRate 0.0160 Epoch: 17 Global Step: 90730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:22,140-Speed 3438.48 samples/sec Loss 4.0686 LearningRate 0.0160 Epoch: 17 Global Step: 90740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:25,218-Speed 3326.78 samples/sec Loss 4.1033 LearningRate 0.0159 Epoch: 17 Global Step: 90750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:28,229-Speed 3402.25 samples/sec Loss 4.1225 LearningRate 0.0159 Epoch: 17 Global Step: 90760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:31,266-Speed 3372.40 samples/sec Loss 4.1046 LearningRate 0.0159 Epoch: 17 Global Step: 90770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:34,276-Speed 3402.56 samples/sec Loss 4.0239 LearningRate 0.0159 Epoch: 17 Global Step: 90780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:37,259-Speed 3433.92 samples/sec Loss 4.1757 LearningRate 0.0159 Epoch: 17 Global Step: 90790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:40,243-Speed 3432.52 samples/sec Loss 4.2309 LearningRate 0.0159 Epoch: 17 Global Step: 90800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:43,227-Speed 3432.76 samples/sec Loss 4.1553 LearningRate 0.0159 Epoch: 17 Global Step: 90810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:46,211-Speed 3433.20 samples/sec Loss 4.0633 LearningRate 0.0159 Epoch: 17 Global Step: 90820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:18:49,196-Speed 3431.08 samples/sec Loss 4.1282 LearningRate 0.0159 Epoch: 17 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:18:52,183-Speed 3429.32 samples/sec Loss 4.0655 LearningRate 0.0159 Epoch: 17 Global Step: 90840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:18:55,167-Speed 3432.06 samples/sec Loss 4.2549 LearningRate 0.0159 Epoch: 17 Global Step: 90850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:18:58,149-Speed 3435.30 samples/sec Loss 4.1874 LearningRate 0.0158 Epoch: 17 Global Step: 90860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:01,134-Speed 3430.80 samples/sec Loss 4.2283 LearningRate 0.0158 Epoch: 17 Global Step: 90870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:04,125-Speed 3425.08 samples/sec Loss 4.0806 LearningRate 0.0158 Epoch: 17 Global Step: 90880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:07,239-Speed 3288.95 samples/sec Loss 4.1253 LearningRate 0.0158 Epoch: 17 Global Step: 90890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:10,313-Speed 3331.65 samples/sec Loss 4.1072 LearningRate 0.0158 Epoch: 17 Global Step: 90900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:13,308-Speed 3420.84 samples/sec Loss 4.0708 LearningRate 0.0158 Epoch: 17 Global Step: 90910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:16,300-Speed 3422.80 samples/sec Loss 4.0483 LearningRate 0.0158 Epoch: 17 Global Step: 90920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:19,273-Speed 3445.74 samples/sec Loss 4.1445 LearningRate 0.0158 Epoch: 17 Global Step: 90930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:22,269-Speed 3419.16 samples/sec Loss 4.1808 LearningRate 0.0158 Epoch: 17 Global Step: 90940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:25,336-Speed 3339.19 samples/sec Loss 4.0152 LearningRate 0.0158 Epoch: 17 Global Step: 90950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:28,422-Speed 3319.04 samples/sec Loss 4.0715 LearningRate 0.0158 Epoch: 17 Global Step: 90960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:31,400-Speed 3439.12 samples/sec Loss 3.9473 LearningRate 0.0157 Epoch: 17 Global Step: 90970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:34,385-Speed 3431.35 samples/sec Loss 4.1005 LearningRate 0.0157 Epoch: 17 Global Step: 90980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:37,368-Speed 3435.00 samples/sec Loss 4.0984 LearningRate 0.0157 Epoch: 17 Global Step: 90990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:40,351-Speed 3432.77 samples/sec Loss 4.2679 LearningRate 0.0157 Epoch: 17 Global Step: 91000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:43,349-Speed 3416.31 samples/sec Loss 4.2535 LearningRate 0.0157 Epoch: 17 Global Step: 91010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:46,415-Speed 3341.79 samples/sec Loss 4.1473 LearningRate 0.0157 Epoch: 17 Global Step: 91020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:19:49,430-Speed 3396.53 samples/sec Loss 4.1195 LearningRate 0.0157 Epoch: 17 Global Step: 91030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:19:52,739-Speed 3095.32 samples/sec Loss 4.0275 LearningRate 0.0157 Epoch: 17 Global Step: 91040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:06,881-Speed 724.16 samples/sec Loss 3.5404 LearningRate 0.0157 Epoch: 18 Global Step: 91050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:09,895-Speed 3399.50 samples/sec Loss 3.2234 LearningRate 0.0157 Epoch: 18 Global Step: 91060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:12,975-Speed 3324.69 samples/sec Loss 3.2770 LearningRate 0.0157 Epoch: 18 Global Step: 91070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:15,977-Speed 3413.20 samples/sec Loss 3.2617 LearningRate 0.0156 Epoch: 18 Global Step: 91080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:18,983-Speed 3406.69 samples/sec Loss 3.4212 LearningRate 0.0156 Epoch: 18 Global Step: 91090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:21,989-Speed 3408.14 samples/sec Loss 3.2459 LearningRate 0.0156 Epoch: 18 Global Step: 91100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:24,982-Speed 3422.31 samples/sec Loss 3.3755 LearningRate 0.0156 Epoch: 18 Global Step: 91110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:28,044-Speed 3344.96 samples/sec Loss 3.1826 LearningRate 0.0156 Epoch: 18 Global Step: 91120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:31,118-Speed 3331.30 samples/sec Loss 3.3344 LearningRate 0.0156 Epoch: 18 Global Step: 91130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:20:34,125-Speed 3406.62 samples/sec Loss 3.3015 LearningRate 0.0156 Epoch: 18 Global Step: 91140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:20:37,103-Speed 3439.17 samples/sec Loss 3.2925 LearningRate 0.0156 Epoch: 18 Global Step: 91150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:20:40,173-Speed 3336.83 samples/sec Loss 3.3311 LearningRate 0.0156 Epoch: 18 Global Step: 91160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:20:43,226-Speed 3354.98 samples/sec Loss 3.3603 LearningRate 0.0156 Epoch: 18 Global Step: 91170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:46,232-Speed 3407.70 samples/sec Loss 3.5523 LearningRate 0.0156 Epoch: 18 Global Step: 91180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:49,244-Speed 3400.36 samples/sec Loss 3.3902 LearningRate 0.0156 Epoch: 18 Global Step: 91190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:52,252-Speed 3409.39 samples/sec Loss 3.3195 LearningRate 0.0155 Epoch: 18 Global Step: 91200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:55,281-Speed 3381.00 samples/sec Loss 3.3564 LearningRate 0.0155 Epoch: 18 Global Step: 91210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:20:58,365-Speed 3321.68 samples/sec Loss 3.1826 LearningRate 0.0155 Epoch: 18 Global Step: 91220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:21:01,356-Speed 3424.45 samples/sec Loss 3.3665 LearningRate 0.0155 Epoch: 18 Global Step: 91230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:21:04,338-Speed 3434.91 samples/sec Loss 3.4865 LearningRate 0.0155 Epoch: 18 Global Step: 91240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:21:07,341-Speed 3411.15 samples/sec Loss 3.2808 LearningRate 0.0155 Epoch: 18 Global Step: 91250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:21:10,419-Speed 3326.99 samples/sec Loss 3.3751 LearningRate 0.0155 Epoch: 18 Global Step: 91260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:21:13,526-Speed 3296.89 samples/sec Loss 3.4981 LearningRate 0.0155 Epoch: 18 Global Step: 91270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:21:16,661-Speed 3268.51 samples/sec Loss 3.4155 LearningRate 0.0155 Epoch: 18 Global Step: 91280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:21:19,778-Speed 3285.15 samples/sec Loss 3.3804 LearningRate 0.0155 Epoch: 18 Global Step: 91290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:21:22,767-Speed 3426.95 samples/sec Loss 3.4177 LearningRate 0.0155 Epoch: 18 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:21:25,750-Speed 3434.24 samples/sec Loss 3.4532 LearningRate 0.0154 Epoch: 18 Global Step: 91310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:21:28,758-Speed 3404.29 samples/sec Loss 3.4258 LearningRate 0.0154 Epoch: 18 Global Step: 91320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:21:31,752-Speed 3422.39 samples/sec Loss 3.4736 LearningRate 0.0154 Epoch: 18 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:21:34,747-Speed 3419.54 samples/sec Loss 3.4830 LearningRate 0.0154 Epoch: 18 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:21:37,746-Speed 3415.92 samples/sec Loss 3.4524 LearningRate 0.0154 Epoch: 18 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:21:40,845-Speed 3304.07 samples/sec Loss 3.5018 LearningRate 0.0154 Epoch: 18 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:21:43,977-Speed 3270.72 samples/sec Loss 3.3896 LearningRate 0.0154 Epoch: 18 Global Step: 91370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 02:21:46,928-Speed 3470.88 samples/sec Loss 3.4477 LearningRate 0.0154 Epoch: 18 Global Step: 91380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:21:49,919-Speed 3424.67 samples/sec Loss 3.4258 LearningRate 0.0154 Epoch: 18 Global Step: 91390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:21:52,960-Speed 3368.77 samples/sec Loss 3.4843 LearningRate 0.0154 Epoch: 18 Global Step: 91400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:21:55,950-Speed 3425.45 samples/sec Loss 3.4289 LearningRate 0.0154 Epoch: 18 Global Step: 91410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:21:59,016-Speed 3341.20 samples/sec Loss 3.5538 LearningRate 0.0153 Epoch: 18 Global Step: 91420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:22:02,009-Speed 3421.70 samples/sec Loss 3.3451 LearningRate 0.0153 Epoch: 18 Global Step: 91430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:22:05,064-Speed 3352.44 samples/sec Loss 3.4837 LearningRate 0.0153 Epoch: 18 Global Step: 91440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:22:08,117-Speed 3355.96 samples/sec Loss 3.5147 LearningRate 0.0153 Epoch: 18 Global Step: 91450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:22:11,106-Speed 3425.65 samples/sec Loss 3.4677 LearningRate 0.0153 Epoch: 18 Global Step: 91460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:22:14,166-Speed 3347.52 samples/sec Loss 3.5048 LearningRate 0.0153 Epoch: 18 Global Step: 91470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:22:17,207-Speed 3368.30 samples/sec Loss 3.4099 LearningRate 0.0153 Epoch: 18 Global Step: 91480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:20,191-Speed 3432.92 samples/sec Loss 3.4471 LearningRate 0.0153 Epoch: 18 Global Step: 91490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:23,175-Speed 3432.70 samples/sec Loss 3.4532 LearningRate 0.0153 Epoch: 18 Global Step: 91500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:26,213-Speed 3371.82 samples/sec Loss 3.3685 LearningRate 0.0153 Epoch: 18 Global Step: 91510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:29,398-Speed 3215.60 samples/sec Loss 3.3773 LearningRate 0.0153 Epoch: 18 Global Step: 91520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:32,528-Speed 3272.28 samples/sec Loss 3.5392 LearningRate 0.0153 Epoch: 18 Global Step: 91530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:35,584-Speed 3351.97 samples/sec Loss 3.5390 LearningRate 0.0152 Epoch: 18 Global Step: 91540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:38,706-Speed 3280.37 samples/sec Loss 3.5274 LearningRate 0.0152 Epoch: 18 Global Step: 91550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:41,781-Speed 3330.53 samples/sec Loss 3.4793 LearningRate 0.0152 Epoch: 18 Global Step: 91560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:44,805-Speed 3386.97 samples/sec Loss 3.5682 LearningRate 0.0152 Epoch: 18 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:47,834-Speed 3382.38 samples/sec Loss 3.5000 LearningRate 0.0152 Epoch: 18 Global Step: 91580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 02:22:50,801-Speed 3452.33 samples/sec Loss 3.4379 LearningRate 0.0152 Epoch: 18 Global Step: 91590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:53,785-Speed 3432.05 samples/sec Loss 3.4398 LearningRate 0.0152 Epoch: 18 Global Step: 91600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:56,866-Speed 3324.90 samples/sec Loss 3.5113 LearningRate 0.0152 Epoch: 18 Global Step: 91610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:22:59,872-Speed 3407.10 samples/sec Loss 3.5470 LearningRate 0.0152 Epoch: 18 Global Step: 91620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:02,865-Speed 3422.60 samples/sec Loss 3.5027 LearningRate 0.0152 Epoch: 18 Global Step: 91630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:05,859-Speed 3420.73 samples/sec Loss 3.4528 LearningRate 0.0152 Epoch: 18 Global Step: 91640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:08,945-Speed 3319.78 samples/sec Loss 3.6474 LearningRate 0.0151 Epoch: 18 Global Step: 91650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:12,012-Speed 3340.05 samples/sec Loss 3.5495 LearningRate 0.0151 Epoch: 18 Global Step: 91660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:15,137-Speed 3277.85 samples/sec Loss 3.3969 LearningRate 0.0151 Epoch: 18 Global Step: 91670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:18,150-Speed 3399.61 samples/sec Loss 3.4979 LearningRate 0.0151 Epoch: 18 Global Step: 91680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:21,165-Speed 3397.73 samples/sec Loss 3.6464 LearningRate 0.0151 Epoch: 18 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:24,268-Speed 3300.81 samples/sec Loss 3.5157 LearningRate 0.0151 Epoch: 18 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:27,258-Speed 3426.21 samples/sec Loss 3.5828 LearningRate 0.0151 Epoch: 18 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:30,237-Speed 3437.40 samples/sec Loss 3.4568 LearningRate 0.0151 Epoch: 18 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:33,266-Speed 3381.73 samples/sec Loss 3.5376 LearningRate 0.0151 Epoch: 18 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:36,388-Speed 3280.63 samples/sec Loss 3.5760 LearningRate 0.0151 Epoch: 18 Global Step: 91740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:39,411-Speed 3388.68 samples/sec Loss 3.4519 LearningRate 0.0151 Epoch: 18 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:42,423-Speed 3400.96 samples/sec Loss 3.4005 LearningRate 0.0151 Epoch: 18 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:45,412-Speed 3427.23 samples/sec Loss 3.5948 LearningRate 0.0150 Epoch: 18 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:48,459-Speed 3361.16 samples/sec Loss 3.5788 LearningRate 0.0150 Epoch: 18 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:51,427-Speed 3451.11 samples/sec Loss 3.5071 LearningRate 0.0150 Epoch: 18 Global Step: 91790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:54,434-Speed 3406.37 samples/sec Loss 3.5144 LearningRate 0.0150 Epoch: 18 Global Step: 91800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:23:57,534-Speed 3304.21 samples/sec Loss 3.5147 LearningRate 0.0150 Epoch: 18 Global Step: 91810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:24:00,511-Speed 3440.68 samples/sec Loss 3.6774 LearningRate 0.0150 Epoch: 18 Global Step: 91820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:24:03,612-Speed 3302.87 samples/sec Loss 3.5704 LearningRate 0.0150 Epoch: 18 Global Step: 91830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:24:06,605-Speed 3422.42 samples/sec Loss 3.4768 LearningRate 0.0150 Epoch: 18 Global Step: 91840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:24:09,649-Speed 3364.79 samples/sec Loss 3.6050 LearningRate 0.0150 Epoch: 18 Global Step: 91850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:24:12,728-Speed 3326.53 samples/sec Loss 3.4900 LearningRate 0.0150 Epoch: 18 Global Step: 91860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:24:15,717-Speed 3427.12 samples/sec Loss 3.5348 LearningRate 0.0150 Epoch: 18 Global Step: 91870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:24:18,718-Speed 3412.87 samples/sec Loss 3.5978 LearningRate 0.0149 Epoch: 18 Global Step: 91880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:24:21,763-Speed 3364.15 samples/sec Loss 3.5693 LearningRate 0.0149 Epoch: 18 Global Step: 91890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:24:24,800-Speed 3372.75 samples/sec Loss 3.4744 LearningRate 0.0149 Epoch: 18 Global Step: 91900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:24:27,785-Speed 3431.07 samples/sec Loss 3.5566 LearningRate 0.0149 Epoch: 18 Global Step: 91910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:24:30,771-Speed 3429.46 samples/sec Loss 3.5537 LearningRate 0.0149 Epoch: 18 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:24:33,795-Speed 3387.32 samples/sec Loss 3.4627 LearningRate 0.0149 Epoch: 18 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:24:36,841-Speed 3363.10 samples/sec Loss 3.5748 LearningRate 0.0149 Epoch: 18 Global Step: 91940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:24:39,827-Speed 3430.13 samples/sec Loss 3.5199 LearningRate 0.0149 Epoch: 18 Global Step: 91950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:24:42,828-Speed 3413.42 samples/sec Loss 3.6426 LearningRate 0.0149 Epoch: 18 Global Step: 91960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:24:45,868-Speed 3369.82 samples/sec Loss 3.6528 LearningRate 0.0149 Epoch: 18 Global Step: 91970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:24:48,892-Speed 3386.82 samples/sec Loss 3.5651 LearningRate 0.0149 Epoch: 18 Global Step: 91980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:24:51,905-Speed 3399.88 samples/sec Loss 3.6425 LearningRate 0.0149 Epoch: 18 Global Step: 91990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:24:54,949-Speed 3364.32 samples/sec Loss 3.5069 LearningRate 0.0148 Epoch: 18 Global Step: 92000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:25:38,057-[lfw][92000]XNorm: 22.794350 Training: 2022-01-20 02:25:38,057-[lfw][92000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-20 02:25:38,058-[lfw][92000]Accuracy-Highest: 0.99833 Training: 2022-01-20 02:26:27,859-[cfp_fp][92000]XNorm: 20.823463 Training: 2022-01-20 02:26:27,860-[cfp_fp][92000]Accuracy-Flip: 0.97986+-0.00540 Training: 2022-01-20 02:26:27,860-[cfp_fp][92000]Accuracy-Highest: 0.98186 Training: 2022-01-20 02:27:10,852-[agedb_30][92000]XNorm: 22.791866 Training: 2022-01-20 02:27:10,853-[agedb_30][92000]Accuracy-Flip: 0.98050+-0.00723 Training: 2022-01-20 02:27:10,853-[agedb_30][92000]Accuracy-Highest: 0.98233 Training: 2022-01-20 02:27:13,983-Speed 73.65 samples/sec Loss 3.6610 LearningRate 0.0148 Epoch: 18 Global Step: 92010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:27:17,056-Speed 3333.32 samples/sec Loss 3.5825 LearningRate 0.0148 Epoch: 18 Global Step: 92020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:27:20,026-Speed 3449.00 samples/sec Loss 3.5271 LearningRate 0.0148 Epoch: 18 Global Step: 92030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:27:23,013-Speed 3429.04 samples/sec Loss 3.6015 LearningRate 0.0148 Epoch: 18 Global Step: 92040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:27:26,006-Speed 3422.26 samples/sec Loss 3.6554 LearningRate 0.0148 Epoch: 18 Global Step: 92050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:27:28,987-Speed 3435.60 samples/sec Loss 3.6286 LearningRate 0.0148 Epoch: 18 Global Step: 92060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:27:32,021-Speed 3376.03 samples/sec Loss 3.6964 LearningRate 0.0148 Epoch: 18 Global Step: 92070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:27:35,006-Speed 3431.37 samples/sec Loss 3.6186 LearningRate 0.0148 Epoch: 18 Global Step: 92080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:27:37,983-Speed 3440.71 samples/sec Loss 3.5700 LearningRate 0.0148 Epoch: 18 Global Step: 92090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:27:40,990-Speed 3406.71 samples/sec Loss 3.5740 LearningRate 0.0148 Epoch: 18 Global Step: 92100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-20 02:27:43,991-Speed 3412.31 samples/sec Loss 3.5744 LearningRate 0.0148 Epoch: 18 Global Step: 92110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:27:47,004-Speed 3400.00 samples/sec Loss 3.5123 LearningRate 0.0147 Epoch: 18 Global Step: 92120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:27:49,988-Speed 3432.58 samples/sec Loss 3.6288 LearningRate 0.0147 Epoch: 18 Global Step: 92130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:27:52,979-Speed 3425.08 samples/sec Loss 3.6367 LearningRate 0.0147 Epoch: 18 Global Step: 92140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:27:55,961-Speed 3433.71 samples/sec Loss 3.5237 LearningRate 0.0147 Epoch: 18 Global Step: 92150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:27:58,957-Speed 3419.19 samples/sec Loss 3.5594 LearningRate 0.0147 Epoch: 18 Global Step: 92160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:28:01,989-Speed 3378.53 samples/sec Loss 3.6250 LearningRate 0.0147 Epoch: 18 Global Step: 92170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:28:05,118-Speed 3273.35 samples/sec Loss 3.7206 LearningRate 0.0147 Epoch: 18 Global Step: 92180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:28:08,158-Speed 3370.00 samples/sec Loss 3.6678 LearningRate 0.0147 Epoch: 18 Global Step: 92190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:28:11,340-Speed 3218.08 samples/sec Loss 3.7274 LearningRate 0.0147 Epoch: 18 Global Step: 92200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-20 02:28:14,329-Speed 3426.68 samples/sec Loss 3.6293 LearningRate 0.0147 Epoch: 18 Global Step: 92210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-20 02:28:17,358-Speed 3381.90 samples/sec Loss 3.6617 LearningRate 0.0147 Epoch: 18 Global Step: 92220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:20,501-Speed 3258.29 samples/sec Loss 3.6971 LearningRate 0.0146 Epoch: 18 Global Step: 92230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:23,515-Speed 3399.01 samples/sec Loss 3.6320 LearningRate 0.0146 Epoch: 18 Global Step: 92240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:26,499-Speed 3432.47 samples/sec Loss 3.6456 LearningRate 0.0146 Epoch: 18 Global Step: 92250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:29,477-Speed 3440.30 samples/sec Loss 3.7471 LearningRate 0.0146 Epoch: 18 Global Step: 92260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:32,465-Speed 3427.35 samples/sec Loss 3.7270 LearningRate 0.0146 Epoch: 18 Global Step: 92270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:35,463-Speed 3416.28 samples/sec Loss 3.6382 LearningRate 0.0146 Epoch: 18 Global Step: 92280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:38,472-Speed 3405.03 samples/sec Loss 3.5690 LearningRate 0.0146 Epoch: 18 Global Step: 92290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:41,493-Speed 3390.43 samples/sec Loss 3.6402 LearningRate 0.0146 Epoch: 18 Global Step: 92300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:44,465-Speed 3445.21 samples/sec Loss 3.6098 LearningRate 0.0146 Epoch: 18 Global Step: 92310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:47,468-Speed 3411.95 samples/sec Loss 3.5704 LearningRate 0.0146 Epoch: 18 Global Step: 92320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:50,591-Speed 3278.89 samples/sec Loss 3.7112 LearningRate 0.0146 Epoch: 18 Global Step: 92330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:53,570-Speed 3438.32 samples/sec Loss 3.7271 LearningRate 0.0146 Epoch: 18 Global Step: 92340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:56,558-Speed 3428.84 samples/sec Loss 3.6166 LearningRate 0.0145 Epoch: 18 Global Step: 92350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:28:59,554-Speed 3417.95 samples/sec Loss 3.6680 LearningRate 0.0145 Epoch: 18 Global Step: 92360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:29:02,542-Speed 3428.94 samples/sec Loss 3.6849 LearningRate 0.0145 Epoch: 18 Global Step: 92370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:29:05,557-Speed 3396.74 samples/sec Loss 3.6684 LearningRate 0.0145 Epoch: 18 Global Step: 92380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:29:08,666-Speed 3294.75 samples/sec Loss 3.5555 LearningRate 0.0145 Epoch: 18 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:29:11,717-Speed 3357.18 samples/sec Loss 3.7296 LearningRate 0.0145 Epoch: 18 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:29:14,701-Speed 3431.89 samples/sec Loss 3.6578 LearningRate 0.0145 Epoch: 18 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:29:17,658-Speed 3464.51 samples/sec Loss 3.7922 LearningRate 0.0145 Epoch: 18 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:29:20,675-Speed 3394.29 samples/sec Loss 3.6697 LearningRate 0.0145 Epoch: 18 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:29:23,684-Speed 3405.60 samples/sec Loss 3.5148 LearningRate 0.0145 Epoch: 18 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:29:26,676-Speed 3423.18 samples/sec Loss 3.7843 LearningRate 0.0145 Epoch: 18 Global Step: 92450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:29:29,635-Speed 3461.19 samples/sec Loss 3.6756 LearningRate 0.0145 Epoch: 18 Global Step: 92460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:29:32,646-Speed 3402.10 samples/sec Loss 3.7249 LearningRate 0.0144 Epoch: 18 Global Step: 92470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:29:35,657-Speed 3401.68 samples/sec Loss 3.5988 LearningRate 0.0144 Epoch: 18 Global Step: 92480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:29:38,645-Speed 3427.55 samples/sec Loss 3.7223 LearningRate 0.0144 Epoch: 18 Global Step: 92490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:29:41,643-Speed 3416.76 samples/sec Loss 3.7490 LearningRate 0.0144 Epoch: 18 Global Step: 92500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:29:44,783-Speed 3261.59 samples/sec Loss 3.6911 LearningRate 0.0144 Epoch: 18 Global Step: 92510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:29:47,835-Speed 3355.88 samples/sec Loss 3.7389 LearningRate 0.0144 Epoch: 18 Global Step: 92520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:29:50,826-Speed 3425.58 samples/sec Loss 3.7362 LearningRate 0.0144 Epoch: 18 Global Step: 92530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:29:53,812-Speed 3430.47 samples/sec Loss 3.7148 LearningRate 0.0144 Epoch: 18 Global Step: 92540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:29:56,817-Speed 3407.73 samples/sec Loss 3.6286 LearningRate 0.0144 Epoch: 18 Global Step: 92550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:29:59,828-Speed 3402.48 samples/sec Loss 3.6437 LearningRate 0.0144 Epoch: 18 Global Step: 92560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:30:02,821-Speed 3421.67 samples/sec Loss 3.6825 LearningRate 0.0144 Epoch: 18 Global Step: 92570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:30:05,867-Speed 3362.76 samples/sec Loss 3.6859 LearningRate 0.0143 Epoch: 18 Global Step: 92580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:30:08,868-Speed 3413.78 samples/sec Loss 3.8023 LearningRate 0.0143 Epoch: 18 Global Step: 92590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:30:11,846-Speed 3438.43 samples/sec Loss 3.7255 LearningRate 0.0143 Epoch: 18 Global Step: 92600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:30:15,149-Speed 3101.11 samples/sec Loss 3.7380 LearningRate 0.0143 Epoch: 18 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:30:18,265-Speed 3287.59 samples/sec Loss 3.5943 LearningRate 0.0143 Epoch: 18 Global Step: 92620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:30:21,311-Speed 3362.21 samples/sec Loss 3.6312 LearningRate 0.0143 Epoch: 18 Global Step: 92630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:30:24,403-Speed 3313.01 samples/sec Loss 3.6454 LearningRate 0.0143 Epoch: 18 Global Step: 92640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:30:27,478-Speed 3330.70 samples/sec Loss 3.5885 LearningRate 0.0143 Epoch: 18 Global Step: 92650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:30:30,609-Speed 3271.15 samples/sec Loss 3.6746 LearningRate 0.0143 Epoch: 18 Global Step: 92660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:30:33,599-Speed 3425.75 samples/sec Loss 3.8081 LearningRate 0.0143 Epoch: 18 Global Step: 92670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:30:36,589-Speed 3426.22 samples/sec Loss 3.7652 LearningRate 0.0143 Epoch: 18 Global Step: 92680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:30:39,674-Speed 3319.60 samples/sec Loss 3.7144 LearningRate 0.0143 Epoch: 18 Global Step: 92690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:30:42,707-Speed 3376.77 samples/sec Loss 3.6212 LearningRate 0.0142 Epoch: 18 Global Step: 92700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:30:45,712-Speed 3408.63 samples/sec Loss 3.6772 LearningRate 0.0142 Epoch: 18 Global Step: 92710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:30:48,695-Speed 3434.56 samples/sec Loss 3.6160 LearningRate 0.0142 Epoch: 18 Global Step: 92720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:30:51,741-Speed 3362.68 samples/sec Loss 3.7562 LearningRate 0.0142 Epoch: 18 Global Step: 92730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:30:54,794-Speed 3354.03 samples/sec Loss 3.6835 LearningRate 0.0142 Epoch: 18 Global Step: 92740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:30:57,771-Speed 3442.39 samples/sec Loss 3.6422 LearningRate 0.0142 Epoch: 18 Global Step: 92750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:31:00,754-Speed 3433.65 samples/sec Loss 3.6653 LearningRate 0.0142 Epoch: 18 Global Step: 92760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:31:03,740-Speed 3429.77 samples/sec Loss 3.7608 LearningRate 0.0142 Epoch: 18 Global Step: 92770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:31:06,725-Speed 3431.68 samples/sec Loss 3.7277 LearningRate 0.0142 Epoch: 18 Global Step: 92780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:31:09,724-Speed 3415.22 samples/sec Loss 3.8182 LearningRate 0.0142 Epoch: 18 Global Step: 92790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:31:12,712-Speed 3427.70 samples/sec Loss 3.6970 LearningRate 0.0142 Epoch: 18 Global Step: 92800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:31:15,799-Speed 3318.51 samples/sec Loss 3.7205 LearningRate 0.0142 Epoch: 18 Global Step: 92810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:31:18,814-Speed 3397.16 samples/sec Loss 3.7552 LearningRate 0.0141 Epoch: 18 Global Step: 92820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:31:21,802-Speed 3428.15 samples/sec Loss 3.6625 LearningRate 0.0141 Epoch: 18 Global Step: 92830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:31:24,817-Speed 3396.60 samples/sec Loss 3.8703 LearningRate 0.0141 Epoch: 18 Global Step: 92840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:31:27,831-Speed 3399.07 samples/sec Loss 3.7419 LearningRate 0.0141 Epoch: 18 Global Step: 92850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:31:30,810-Speed 3438.28 samples/sec Loss 3.5891 LearningRate 0.0141 Epoch: 18 Global Step: 92860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:31:33,815-Speed 3408.25 samples/sec Loss 3.7564 LearningRate 0.0141 Epoch: 18 Global Step: 92870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:31:36,835-Speed 3392.12 samples/sec Loss 3.6962 LearningRate 0.0141 Epoch: 18 Global Step: 92880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:31:39,830-Speed 3418.83 samples/sec Loss 3.5386 LearningRate 0.0141 Epoch: 18 Global Step: 92890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:31:42,874-Speed 3366.28 samples/sec Loss 3.7041 LearningRate 0.0141 Epoch: 18 Global Step: 92900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:31:45,916-Speed 3367.19 samples/sec Loss 3.8080 LearningRate 0.0141 Epoch: 18 Global Step: 92910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:31:49,082-Speed 3234.92 samples/sec Loss 3.5743 LearningRate 0.0141 Epoch: 18 Global Step: 92920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:31:52,163-Speed 3324.23 samples/sec Loss 3.6048 LearningRate 0.0141 Epoch: 18 Global Step: 92930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:31:55,154-Speed 3424.34 samples/sec Loss 3.7064 LearningRate 0.0140 Epoch: 18 Global Step: 92940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:31:58,118-Speed 3455.83 samples/sec Loss 3.6838 LearningRate 0.0140 Epoch: 18 Global Step: 92950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:32:01,105-Speed 3429.08 samples/sec Loss 3.6071 LearningRate 0.0140 Epoch: 18 Global Step: 92960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:32:04,102-Speed 3418.10 samples/sec Loss 3.6797 LearningRate 0.0140 Epoch: 18 Global Step: 92970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:32:07,080-Speed 3438.77 samples/sec Loss 3.7544 LearningRate 0.0140 Epoch: 18 Global Step: 92980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:32:10,094-Speed 3398.29 samples/sec Loss 3.6490 LearningRate 0.0140 Epoch: 18 Global Step: 92990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:32:13,063-Speed 3450.73 samples/sec Loss 3.6503 LearningRate 0.0140 Epoch: 18 Global Step: 93000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:16,076-Speed 3398.92 samples/sec Loss 3.7100 LearningRate 0.0140 Epoch: 18 Global Step: 93010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:19,138-Speed 3345.95 samples/sec Loss 3.6752 LearningRate 0.0140 Epoch: 18 Global Step: 93020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:22,141-Speed 3410.34 samples/sec Loss 3.6956 LearningRate 0.0140 Epoch: 18 Global Step: 93030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:25,128-Speed 3428.24 samples/sec Loss 3.7719 LearningRate 0.0140 Epoch: 18 Global Step: 93040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:28,159-Speed 3380.42 samples/sec Loss 3.6922 LearningRate 0.0140 Epoch: 18 Global Step: 93050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:31,233-Speed 3331.42 samples/sec Loss 3.7433 LearningRate 0.0139 Epoch: 18 Global Step: 93060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:34,292-Speed 3348.25 samples/sec Loss 3.6632 LearningRate 0.0139 Epoch: 18 Global Step: 93070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:37,423-Speed 3271.47 samples/sec Loss 3.7818 LearningRate 0.0139 Epoch: 18 Global Step: 93080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:40,419-Speed 3419.17 samples/sec Loss 3.6823 LearningRate 0.0139 Epoch: 18 Global Step: 93090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:43,409-Speed 3425.51 samples/sec Loss 3.6637 LearningRate 0.0139 Epoch: 18 Global Step: 93100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:32:46,527-Speed 3285.09 samples/sec Loss 3.6588 LearningRate 0.0139 Epoch: 18 Global Step: 93110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:49,546-Speed 3392.71 samples/sec Loss 3.6374 LearningRate 0.0139 Epoch: 18 Global Step: 93120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:52,579-Speed 3376.89 samples/sec Loss 3.7327 LearningRate 0.0139 Epoch: 18 Global Step: 93130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:55,642-Speed 3343.69 samples/sec Loss 3.7144 LearningRate 0.0139 Epoch: 18 Global Step: 93140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:32:58,679-Speed 3373.53 samples/sec Loss 3.7627 LearningRate 0.0139 Epoch: 18 Global Step: 93150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:01,676-Speed 3417.36 samples/sec Loss 3.6777 LearningRate 0.0139 Epoch: 18 Global Step: 93160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:04,662-Speed 3430.41 samples/sec Loss 3.7117 LearningRate 0.0139 Epoch: 18 Global Step: 93170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:07,646-Speed 3433.21 samples/sec Loss 3.6236 LearningRate 0.0138 Epoch: 18 Global Step: 93180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:10,649-Speed 3410.17 samples/sec Loss 3.7051 LearningRate 0.0138 Epoch: 18 Global Step: 93190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:13,670-Speed 3391.06 samples/sec Loss 3.6789 LearningRate 0.0138 Epoch: 18 Global Step: 93200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:16,713-Speed 3365.26 samples/sec Loss 3.7435 LearningRate 0.0138 Epoch: 18 Global Step: 93210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:33:19,761-Speed 3360.32 samples/sec Loss 3.7728 LearningRate 0.0138 Epoch: 18 Global Step: 93220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:33:22,745-Speed 3433.01 samples/sec Loss 3.6824 LearningRate 0.0138 Epoch: 18 Global Step: 93230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:33:25,730-Speed 3431.01 samples/sec Loss 3.6526 LearningRate 0.0138 Epoch: 18 Global Step: 93240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:33:28,733-Speed 3410.84 samples/sec Loss 3.5578 LearningRate 0.0138 Epoch: 18 Global Step: 93250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:33:31,892-Speed 3242.02 samples/sec Loss 3.6521 LearningRate 0.0138 Epoch: 18 Global Step: 93260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:33:34,852-Speed 3460.94 samples/sec Loss 3.7550 LearningRate 0.0138 Epoch: 18 Global Step: 93270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:37,835-Speed 3433.68 samples/sec Loss 3.7812 LearningRate 0.0138 Epoch: 18 Global Step: 93280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:40,817-Speed 3434.64 samples/sec Loss 3.7180 LearningRate 0.0138 Epoch: 18 Global Step: 93290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:43,801-Speed 3432.94 samples/sec Loss 3.6052 LearningRate 0.0137 Epoch: 18 Global Step: 93300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:46,788-Speed 3428.98 samples/sec Loss 3.7077 LearningRate 0.0137 Epoch: 18 Global Step: 93310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:49,830-Speed 3366.84 samples/sec Loss 3.7140 LearningRate 0.0137 Epoch: 18 Global Step: 93320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:52,885-Speed 3352.73 samples/sec Loss 3.7081 LearningRate 0.0137 Epoch: 18 Global Step: 93330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:55,905-Speed 3392.22 samples/sec Loss 3.7500 LearningRate 0.0137 Epoch: 18 Global Step: 93340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:33:59,057-Speed 3249.00 samples/sec Loss 3.7086 LearningRate 0.0137 Epoch: 18 Global Step: 93350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:34:02,166-Speed 3295.44 samples/sec Loss 3.8088 LearningRate 0.0137 Epoch: 18 Global Step: 93360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:34:05,268-Speed 3301.62 samples/sec Loss 3.7294 LearningRate 0.0137 Epoch: 18 Global Step: 93370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:08,312-Speed 3364.44 samples/sec Loss 3.6307 LearningRate 0.0137 Epoch: 18 Global Step: 93380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:11,354-Speed 3366.87 samples/sec Loss 3.5438 LearningRate 0.0137 Epoch: 18 Global Step: 93390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:14,391-Speed 3373.10 samples/sec Loss 3.7147 LearningRate 0.0137 Epoch: 18 Global Step: 93400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:17,410-Speed 3392.29 samples/sec Loss 3.7155 LearningRate 0.0137 Epoch: 18 Global Step: 93410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:20,517-Speed 3297.01 samples/sec Loss 3.8341 LearningRate 0.0136 Epoch: 18 Global Step: 93420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:23,604-Speed 3318.19 samples/sec Loss 3.7304 LearningRate 0.0136 Epoch: 18 Global Step: 93430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:26,586-Speed 3433.82 samples/sec Loss 3.6879 LearningRate 0.0136 Epoch: 18 Global Step: 93440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:29,603-Speed 3396.13 samples/sec Loss 3.6257 LearningRate 0.0136 Epoch: 18 Global Step: 93450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:32,595-Speed 3423.58 samples/sec Loss 3.7298 LearningRate 0.0136 Epoch: 18 Global Step: 93460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:35,600-Speed 3408.01 samples/sec Loss 3.5908 LearningRate 0.0136 Epoch: 18 Global Step: 93470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-20 02:34:38,561-Speed 3460.22 samples/sec Loss 3.6508 LearningRate 0.0136 Epoch: 18 Global Step: 93480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:41,540-Speed 3438.04 samples/sec Loss 3.6766 LearningRate 0.0136 Epoch: 18 Global Step: 93490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:44,520-Speed 3437.38 samples/sec Loss 3.6244 LearningRate 0.0136 Epoch: 18 Global Step: 93500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:47,503-Speed 3433.19 samples/sec Loss 3.7601 LearningRate 0.0136 Epoch: 18 Global Step: 93510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:50,503-Speed 3414.07 samples/sec Loss 3.8431 LearningRate 0.0136 Epoch: 18 Global Step: 93520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:53,635-Speed 3270.57 samples/sec Loss 3.7594 LearningRate 0.0136 Epoch: 18 Global Step: 93530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:56,648-Speed 3399.67 samples/sec Loss 3.8275 LearningRate 0.0135 Epoch: 18 Global Step: 93540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:34:59,643-Speed 3420.14 samples/sec Loss 3.6330 LearningRate 0.0135 Epoch: 18 Global Step: 93550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:35:02,714-Speed 3335.73 samples/sec Loss 3.7409 LearningRate 0.0135 Epoch: 18 Global Step: 93560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:35:05,705-Speed 3424.06 samples/sec Loss 3.8243 LearningRate 0.0135 Epoch: 18 Global Step: 93570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:35:08,835-Speed 3272.79 samples/sec Loss 3.7327 LearningRate 0.0135 Epoch: 18 Global Step: 93580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:35:11,786-Speed 3470.59 samples/sec Loss 3.5420 LearningRate 0.0135 Epoch: 18 Global Step: 93590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:35:14,782-Speed 3418.80 samples/sec Loss 3.8619 LearningRate 0.0135 Epoch: 18 Global Step: 93600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:35:17,786-Speed 3410.56 samples/sec Loss 3.7240 LearningRate 0.0135 Epoch: 18 Global Step: 93610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:35:20,773-Speed 3429.28 samples/sec Loss 3.6870 LearningRate 0.0135 Epoch: 18 Global Step: 93620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:35:23,785-Speed 3401.47 samples/sec Loss 3.7394 LearningRate 0.0135 Epoch: 18 Global Step: 93630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:35:26,769-Speed 3431.76 samples/sec Loss 3.6722 LearningRate 0.0135 Epoch: 18 Global Step: 93640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:35:29,753-Speed 3433.27 samples/sec Loss 3.7213 LearningRate 0.0135 Epoch: 18 Global Step: 93650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:35:32,749-Speed 3418.82 samples/sec Loss 3.7044 LearningRate 0.0134 Epoch: 18 Global Step: 93660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:35:35,734-Speed 3430.58 samples/sec Loss 3.6519 LearningRate 0.0134 Epoch: 18 Global Step: 93670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:35:38,788-Speed 3354.67 samples/sec Loss 3.6649 LearningRate 0.0134 Epoch: 18 Global Step: 93680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:35:41,817-Speed 3381.05 samples/sec Loss 3.7086 LearningRate 0.0134 Epoch: 18 Global Step: 93690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:35:44,942-Speed 3277.64 samples/sec Loss 3.5556 LearningRate 0.0134 Epoch: 18 Global Step: 93700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:35:47,976-Speed 3375.84 samples/sec Loss 3.8470 LearningRate 0.0134 Epoch: 18 Global Step: 93710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:35:50,962-Speed 3430.01 samples/sec Loss 3.8144 LearningRate 0.0134 Epoch: 18 Global Step: 93720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:35:53,944-Speed 3435.05 samples/sec Loss 3.7572 LearningRate 0.0134 Epoch: 18 Global Step: 93730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:35:56,946-Speed 3411.96 samples/sec Loss 3.7155 LearningRate 0.0134 Epoch: 18 Global Step: 93740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:35:59,935-Speed 3427.32 samples/sec Loss 3.7844 LearningRate 0.0134 Epoch: 18 Global Step: 93750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:02,923-Speed 3427.80 samples/sec Loss 3.6111 LearningRate 0.0134 Epoch: 18 Global Step: 93760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:05,916-Speed 3422.03 samples/sec Loss 3.8712 LearningRate 0.0134 Epoch: 18 Global Step: 93770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:08,922-Speed 3407.55 samples/sec Loss 3.6614 LearningRate 0.0134 Epoch: 18 Global Step: 93780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:11,909-Speed 3429.39 samples/sec Loss 3.7086 LearningRate 0.0133 Epoch: 18 Global Step: 93790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:36:14,877-Speed 3450.08 samples/sec Loss 3.6767 LearningRate 0.0133 Epoch: 18 Global Step: 93800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:17,863-Speed 3430.60 samples/sec Loss 3.7616 LearningRate 0.0133 Epoch: 18 Global Step: 93810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:20,902-Speed 3371.02 samples/sec Loss 3.8646 LearningRate 0.0133 Epoch: 18 Global Step: 93820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:24,022-Speed 3282.36 samples/sec Loss 3.6843 LearningRate 0.0133 Epoch: 18 Global Step: 93830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:27,001-Speed 3439.12 samples/sec Loss 3.7365 LearningRate 0.0133 Epoch: 18 Global Step: 93840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:29,981-Speed 3436.13 samples/sec Loss 3.7072 LearningRate 0.0133 Epoch: 18 Global Step: 93850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:32,983-Speed 3412.08 samples/sec Loss 3.6348 LearningRate 0.0133 Epoch: 18 Global Step: 93860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:35,990-Speed 3406.83 samples/sec Loss 3.7009 LearningRate 0.0133 Epoch: 18 Global Step: 93870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:38,975-Speed 3431.09 samples/sec Loss 3.7116 LearningRate 0.0133 Epoch: 18 Global Step: 93880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:41,971-Speed 3418.95 samples/sec Loss 3.8480 LearningRate 0.0133 Epoch: 18 Global Step: 93890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:44,956-Speed 3431.26 samples/sec Loss 3.7141 LearningRate 0.0133 Epoch: 18 Global Step: 93900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:36:47,939-Speed 3434.49 samples/sec Loss 3.7259 LearningRate 0.0132 Epoch: 18 Global Step: 93910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:36:50,923-Speed 3432.23 samples/sec Loss 3.8505 LearningRate 0.0132 Epoch: 18 Global Step: 93920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:36:53,914-Speed 3424.77 samples/sec Loss 3.6538 LearningRate 0.0132 Epoch: 18 Global Step: 93930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:36:56,882-Speed 3449.86 samples/sec Loss 3.6891 LearningRate 0.0132 Epoch: 18 Global Step: 93940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:36:59,868-Speed 3430.46 samples/sec Loss 3.5528 LearningRate 0.0132 Epoch: 18 Global Step: 93950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:37:02,870-Speed 3412.28 samples/sec Loss 3.7506 LearningRate 0.0132 Epoch: 18 Global Step: 93960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:37:05,957-Speed 3318.22 samples/sec Loss 3.6931 LearningRate 0.0132 Epoch: 18 Global Step: 93970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:37:09,093-Speed 3266.21 samples/sec Loss 3.8511 LearningRate 0.0132 Epoch: 18 Global Step: 93980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:37:12,173-Speed 3325.54 samples/sec Loss 3.7032 LearningRate 0.0132 Epoch: 18 Global Step: 93990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:37:15,162-Speed 3426.52 samples/sec Loss 3.6918 LearningRate 0.0132 Epoch: 18 Global Step: 94000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:37:58,205-[lfw][94000]XNorm: 21.741216 Training: 2022-01-20 02:37:58,206-[lfw][94000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-20 02:37:58,206-[lfw][94000]Accuracy-Highest: 0.99833 Training: 2022-01-20 02:38:47,974-[cfp_fp][94000]XNorm: 20.248166 Training: 2022-01-20 02:38:47,975-[cfp_fp][94000]Accuracy-Flip: 0.98429+-0.00728 Training: 2022-01-20 02:38:47,975-[cfp_fp][94000]Accuracy-Highest: 0.98429 Training: 2022-01-20 02:39:30,807-[agedb_30][94000]XNorm: 21.936711 Training: 2022-01-20 02:39:30,808-[agedb_30][94000]Accuracy-Flip: 0.98367+-0.00562 Training: 2022-01-20 02:39:30,808-[agedb_30][94000]Accuracy-Highest: 0.98367 Training: 2022-01-20 02:39:33,790-Speed 73.87 samples/sec Loss 3.7132 LearningRate 0.0132 Epoch: 18 Global Step: 94010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:39:36,762-Speed 3445.89 samples/sec Loss 3.7327 LearningRate 0.0132 Epoch: 18 Global Step: 94020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:39:39,739-Speed 3441.02 samples/sec Loss 3.7698 LearningRate 0.0131 Epoch: 18 Global Step: 94030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:39:42,713-Speed 3444.39 samples/sec Loss 3.7394 LearningRate 0.0131 Epoch: 18 Global Step: 94040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:39:45,695-Speed 3435.26 samples/sec Loss 3.7266 LearningRate 0.0131 Epoch: 18 Global Step: 94050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:39:48,717-Speed 3420.73 samples/sec Loss 3.6785 LearningRate 0.0131 Epoch: 18 Global Step: 94060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:39:51,866-Speed 3251.98 samples/sec Loss 3.6316 LearningRate 0.0131 Epoch: 18 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:39:54,925-Speed 3348.21 samples/sec Loss 3.6854 LearningRate 0.0131 Epoch: 18 Global Step: 94080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:39:57,944-Speed 3432.98 samples/sec Loss 3.7285 LearningRate 0.0131 Epoch: 18 Global Step: 94090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:40:00,926-Speed 3434.81 samples/sec Loss 3.6175 LearningRate 0.0131 Epoch: 18 Global Step: 94100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:40:03,973-Speed 3395.51 samples/sec Loss 3.8343 LearningRate 0.0131 Epoch: 18 Global Step: 94110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:40:06,956-Speed 3434.62 samples/sec Loss 3.7087 LearningRate 0.0131 Epoch: 18 Global Step: 94120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:40:09,927-Speed 3446.82 samples/sec Loss 3.7197 LearningRate 0.0131 Epoch: 18 Global Step: 94130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:40:13,835-Speed 3373.18 samples/sec Loss 3.6506 LearningRate 0.0131 Epoch: 18 Global Step: 94140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:40:16,846-Speed 3400.95 samples/sec Loss 3.6841 LearningRate 0.0130 Epoch: 18 Global Step: 94150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:40:19,902-Speed 3352.77 samples/sec Loss 3.5912 LearningRate 0.0130 Epoch: 18 Global Step: 94160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:40:22,902-Speed 3413.43 samples/sec Loss 3.7084 LearningRate 0.0130 Epoch: 18 Global Step: 94170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:40:25,896-Speed 3420.74 samples/sec Loss 3.7466 LearningRate 0.0130 Epoch: 18 Global Step: 94180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:40:28,919-Speed 3388.80 samples/sec Loss 3.5908 LearningRate 0.0130 Epoch: 18 Global Step: 94190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:40:31,994-Speed 3330.36 samples/sec Loss 3.6036 LearningRate 0.0130 Epoch: 18 Global Step: 94200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:40:34,973-Speed 3438.27 samples/sec Loss 3.6931 LearningRate 0.0130 Epoch: 18 Global Step: 94210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:40:37,954-Speed 3436.75 samples/sec Loss 3.7676 LearningRate 0.0130 Epoch: 18 Global Step: 94220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:40:40,932-Speed 3438.98 samples/sec Loss 3.7270 LearningRate 0.0130 Epoch: 18 Global Step: 94230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:40:43,907-Speed 3443.79 samples/sec Loss 3.6227 LearningRate 0.0130 Epoch: 18 Global Step: 94240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:40:46,880-Speed 3444.71 samples/sec Loss 3.6856 LearningRate 0.0130 Epoch: 18 Global Step: 94250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:40:49,886-Speed 3408.16 samples/sec Loss 3.6239 LearningRate 0.0130 Epoch: 18 Global Step: 94260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:40:52,884-Speed 3416.32 samples/sec Loss 3.8021 LearningRate 0.0130 Epoch: 18 Global Step: 94270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:40:55,862-Speed 3438.53 samples/sec Loss 3.7052 LearningRate 0.0129 Epoch: 18 Global Step: 94280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:40:58,847-Speed 3431.70 samples/sec Loss 3.8240 LearningRate 0.0129 Epoch: 18 Global Step: 94290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:01,848-Speed 3413.18 samples/sec Loss 3.5847 LearningRate 0.0129 Epoch: 18 Global Step: 94300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:04,846-Speed 3416.81 samples/sec Loss 3.6824 LearningRate 0.0129 Epoch: 18 Global Step: 94310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:07,827-Speed 3436.03 samples/sec Loss 3.6806 LearningRate 0.0129 Epoch: 18 Global Step: 94320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:10,828-Speed 3412.56 samples/sec Loss 3.5967 LearningRate 0.0129 Epoch: 18 Global Step: 94330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:13,802-Speed 3444.59 samples/sec Loss 3.6281 LearningRate 0.0129 Epoch: 18 Global Step: 94340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:16,801-Speed 3414.77 samples/sec Loss 3.7807 LearningRate 0.0129 Epoch: 18 Global Step: 94350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:19,787-Speed 3431.25 samples/sec Loss 3.7574 LearningRate 0.0129 Epoch: 18 Global Step: 94360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:22,759-Speed 3445.59 samples/sec Loss 3.7053 LearningRate 0.0129 Epoch: 18 Global Step: 94370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:25,740-Speed 3436.59 samples/sec Loss 3.6415 LearningRate 0.0129 Epoch: 18 Global Step: 94380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:28,740-Speed 3413.74 samples/sec Loss 3.7795 LearningRate 0.0129 Epoch: 18 Global Step: 94390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:31,745-Speed 3409.36 samples/sec Loss 3.5990 LearningRate 0.0128 Epoch: 18 Global Step: 94400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:34,734-Speed 3427.43 samples/sec Loss 3.5532 LearningRate 0.0128 Epoch: 18 Global Step: 94410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:37,738-Speed 3408.93 samples/sec Loss 3.6935 LearningRate 0.0128 Epoch: 18 Global Step: 94420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:40,780-Speed 3366.98 samples/sec Loss 3.6523 LearningRate 0.0128 Epoch: 18 Global Step: 94430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:43,765-Speed 3431.02 samples/sec Loss 3.6961 LearningRate 0.0128 Epoch: 18 Global Step: 94440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:46,791-Speed 3384.88 samples/sec Loss 3.5043 LearningRate 0.0128 Epoch: 18 Global Step: 94450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:49,792-Speed 3413.19 samples/sec Loss 3.7334 LearningRate 0.0128 Epoch: 18 Global Step: 94460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:52,852-Speed 3347.35 samples/sec Loss 3.5831 LearningRate 0.0128 Epoch: 18 Global Step: 94470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:41:55,996-Speed 3258.05 samples/sec Loss 3.6697 LearningRate 0.0128 Epoch: 18 Global Step: 94480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:41:59,093-Speed 3308.08 samples/sec Loss 3.7653 LearningRate 0.0128 Epoch: 18 Global Step: 94490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:02,076-Speed 3433.69 samples/sec Loss 3.5175 LearningRate 0.0128 Epoch: 18 Global Step: 94500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:05,061-Speed 3430.91 samples/sec Loss 3.6248 LearningRate 0.0128 Epoch: 18 Global Step: 94510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:08,050-Speed 3427.27 samples/sec Loss 3.7765 LearningRate 0.0128 Epoch: 18 Global Step: 94520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:11,041-Speed 3423.74 samples/sec Loss 3.7276 LearningRate 0.0127 Epoch: 18 Global Step: 94530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:14,024-Speed 3433.82 samples/sec Loss 3.6102 LearningRate 0.0127 Epoch: 18 Global Step: 94540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:17,094-Speed 3336.63 samples/sec Loss 3.7551 LearningRate 0.0127 Epoch: 18 Global Step: 94550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:20,192-Speed 3305.84 samples/sec Loss 3.6215 LearningRate 0.0127 Epoch: 18 Global Step: 94560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:23,207-Speed 3397.92 samples/sec Loss 3.6173 LearningRate 0.0127 Epoch: 18 Global Step: 94570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:26,210-Speed 3411.49 samples/sec Loss 3.6158 LearningRate 0.0127 Epoch: 18 Global Step: 94580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:42:29,229-Speed 3393.19 samples/sec Loss 3.5874 LearningRate 0.0127 Epoch: 18 Global Step: 94590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:32,257-Speed 3381.55 samples/sec Loss 3.7921 LearningRate 0.0127 Epoch: 18 Global Step: 94600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:35,230-Speed 3445.38 samples/sec Loss 3.6257 LearningRate 0.0127 Epoch: 18 Global Step: 94610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:38,254-Speed 3387.41 samples/sec Loss 3.6659 LearningRate 0.0127 Epoch: 18 Global Step: 94620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:41,292-Speed 3371.44 samples/sec Loss 3.6014 LearningRate 0.0127 Epoch: 18 Global Step: 94630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:44,287-Speed 3420.15 samples/sec Loss 3.7038 LearningRate 0.0127 Epoch: 18 Global Step: 94640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:47,264-Speed 3440.46 samples/sec Loss 3.6893 LearningRate 0.0126 Epoch: 18 Global Step: 94650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:50,261-Speed 3417.51 samples/sec Loss 3.5679 LearningRate 0.0126 Epoch: 18 Global Step: 94660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:53,272-Speed 3401.70 samples/sec Loss 3.6822 LearningRate 0.0126 Epoch: 18 Global Step: 94670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:56,368-Speed 3308.45 samples/sec Loss 3.6783 LearningRate 0.0126 Epoch: 18 Global Step: 94680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:42:59,346-Speed 3439.26 samples/sec Loss 3.6774 LearningRate 0.0126 Epoch: 18 Global Step: 94690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:43:02,326-Speed 3437.56 samples/sec Loss 3.5793 LearningRate 0.0126 Epoch: 18 Global Step: 94700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:43:05,318-Speed 3422.60 samples/sec Loss 3.6725 LearningRate 0.0126 Epoch: 18 Global Step: 94710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:43:08,288-Speed 3450.33 samples/sec Loss 3.6207 LearningRate 0.0126 Epoch: 18 Global Step: 94720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:43:11,285-Speed 3417.58 samples/sec Loss 3.6721 LearningRate 0.0126 Epoch: 18 Global Step: 94730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:43:14,313-Speed 3382.17 samples/sec Loss 3.6335 LearningRate 0.0126 Epoch: 18 Global Step: 94740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:43:17,305-Speed 3424.12 samples/sec Loss 3.6128 LearningRate 0.0126 Epoch: 18 Global Step: 94750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:43:20,313-Speed 3405.04 samples/sec Loss 3.5879 LearningRate 0.0126 Epoch: 18 Global Step: 94760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:43:23,345-Speed 3377.82 samples/sec Loss 3.6825 LearningRate 0.0126 Epoch: 18 Global Step: 94770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:43:26,387-Speed 3366.86 samples/sec Loss 3.6884 LearningRate 0.0125 Epoch: 18 Global Step: 94780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:43:29,382-Speed 3420.81 samples/sec Loss 3.5773 LearningRate 0.0125 Epoch: 18 Global Step: 94790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:43:32,422-Speed 3369.01 samples/sec Loss 3.5951 LearningRate 0.0125 Epoch: 18 Global Step: 94800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:43:35,493-Speed 3334.95 samples/sec Loss 3.6680 LearningRate 0.0125 Epoch: 18 Global Step: 94810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:43:38,498-Speed 3409.05 samples/sec Loss 3.7795 LearningRate 0.0125 Epoch: 18 Global Step: 94820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:43:41,558-Speed 3346.26 samples/sec Loss 3.6750 LearningRate 0.0125 Epoch: 18 Global Step: 94830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:43:44,555-Speed 3418.02 samples/sec Loss 3.6585 LearningRate 0.0125 Epoch: 18 Global Step: 94840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:43:47,557-Speed 3412.07 samples/sec Loss 3.5810 LearningRate 0.0125 Epoch: 18 Global Step: 94850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:43:50,684-Speed 3275.72 samples/sec Loss 3.5084 LearningRate 0.0125 Epoch: 18 Global Step: 94860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:43:53,717-Speed 3377.02 samples/sec Loss 3.7864 LearningRate 0.0125 Epoch: 18 Global Step: 94870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:43:56,733-Speed 3395.88 samples/sec Loss 3.6316 LearningRate 0.0125 Epoch: 18 Global Step: 94880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:43:59,719-Speed 3430.77 samples/sec Loss 3.6310 LearningRate 0.0125 Epoch: 18 Global Step: 94890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:02,697-Speed 3438.72 samples/sec Loss 3.7354 LearningRate 0.0125 Epoch: 18 Global Step: 94900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:05,674-Speed 3440.48 samples/sec Loss 3.6203 LearningRate 0.0124 Epoch: 18 Global Step: 94910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:08,655-Speed 3437.65 samples/sec Loss 3.7193 LearningRate 0.0124 Epoch: 18 Global Step: 94920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:11,785-Speed 3271.93 samples/sec Loss 3.5865 LearningRate 0.0124 Epoch: 18 Global Step: 94930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:14,792-Speed 3407.65 samples/sec Loss 3.5436 LearningRate 0.0124 Epoch: 18 Global Step: 94940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:17,776-Speed 3432.82 samples/sec Loss 3.6285 LearningRate 0.0124 Epoch: 18 Global Step: 94950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:20,807-Speed 3379.35 samples/sec Loss 3.6499 LearningRate 0.0124 Epoch: 18 Global Step: 94960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:23,785-Speed 3439.80 samples/sec Loss 3.5943 LearningRate 0.0124 Epoch: 18 Global Step: 94970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:26,907-Speed 3280.44 samples/sec Loss 3.6654 LearningRate 0.0124 Epoch: 18 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:44:29,985-Speed 3327.65 samples/sec Loss 3.6700 LearningRate 0.0124 Epoch: 18 Global Step: 94990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:44:32,987-Speed 3412.20 samples/sec Loss 3.5090 LearningRate 0.0124 Epoch: 18 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:44:36,119-Speed 3270.03 samples/sec Loss 3.5961 LearningRate 0.0124 Epoch: 18 Global Step: 95010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:44:39,256-Speed 3265.66 samples/sec Loss 3.5763 LearningRate 0.0124 Epoch: 18 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:44:42,227-Speed 3447.27 samples/sec Loss 3.6934 LearningRate 0.0123 Epoch: 18 Global Step: 95030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:45,215-Speed 3429.37 samples/sec Loss 3.6395 LearningRate 0.0123 Epoch: 18 Global Step: 95040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:48,201-Speed 3429.65 samples/sec Loss 3.6458 LearningRate 0.0123 Epoch: 18 Global Step: 95050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:51,238-Speed 3372.71 samples/sec Loss 3.6505 LearningRate 0.0123 Epoch: 18 Global Step: 95060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:54,412-Speed 3226.46 samples/sec Loss 3.5681 LearningRate 0.0123 Epoch: 18 Global Step: 95070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:44:57,565-Speed 3249.32 samples/sec Loss 3.7600 LearningRate 0.0123 Epoch: 18 Global Step: 95080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:00,561-Speed 3418.20 samples/sec Loss 3.6706 LearningRate 0.0123 Epoch: 18 Global Step: 95090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:03,551-Speed 3425.68 samples/sec Loss 3.5178 LearningRate 0.0123 Epoch: 18 Global Step: 95100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:06,539-Speed 3428.07 samples/sec Loss 3.7986 LearningRate 0.0123 Epoch: 18 Global Step: 95110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:09,615-Speed 3329.83 samples/sec Loss 3.7767 LearningRate 0.0123 Epoch: 18 Global Step: 95120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:12,629-Speed 3398.56 samples/sec Loss 3.7100 LearningRate 0.0123 Epoch: 18 Global Step: 95130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:45:15,596-Speed 3452.09 samples/sec Loss 3.6987 LearningRate 0.0123 Epoch: 18 Global Step: 95140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:18,612-Speed 3396.39 samples/sec Loss 3.6500 LearningRate 0.0123 Epoch: 18 Global Step: 95150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:21,604-Speed 3423.37 samples/sec Loss 3.7931 LearningRate 0.0122 Epoch: 18 Global Step: 95160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:24,621-Speed 3395.25 samples/sec Loss 3.7597 LearningRate 0.0122 Epoch: 18 Global Step: 95170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:27,606-Speed 3430.43 samples/sec Loss 3.5706 LearningRate 0.0122 Epoch: 18 Global Step: 95180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:30,605-Speed 3416.14 samples/sec Loss 3.7142 LearningRate 0.0122 Epoch: 18 Global Step: 95190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:33,601-Speed 3418.09 samples/sec Loss 3.6812 LearningRate 0.0122 Epoch: 18 Global Step: 95200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:36,625-Speed 3388.05 samples/sec Loss 3.7977 LearningRate 0.0122 Epoch: 18 Global Step: 95210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:39,680-Speed 3352.75 samples/sec Loss 3.5268 LearningRate 0.0122 Epoch: 18 Global Step: 95220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:42,674-Speed 3420.62 samples/sec Loss 3.7130 LearningRate 0.0122 Epoch: 18 Global Step: 95230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:45:45,685-Speed 3402.02 samples/sec Loss 3.6666 LearningRate 0.0122 Epoch: 18 Global Step: 95240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:45:48,836-Speed 3250.03 samples/sec Loss 3.5777 LearningRate 0.0122 Epoch: 18 Global Step: 95250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:45:51,942-Speed 3297.54 samples/sec Loss 3.6095 LearningRate 0.0122 Epoch: 18 Global Step: 95260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:45:54,954-Speed 3401.80 samples/sec Loss 3.7274 LearningRate 0.0122 Epoch: 18 Global Step: 95270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:45:57,942-Speed 3427.65 samples/sec Loss 3.5945 LearningRate 0.0122 Epoch: 18 Global Step: 95280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:00,933-Speed 3424.35 samples/sec Loss 3.6765 LearningRate 0.0121 Epoch: 18 Global Step: 95290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:04,014-Speed 3324.66 samples/sec Loss 3.6029 LearningRate 0.0121 Epoch: 18 Global Step: 95300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:07,090-Speed 3329.36 samples/sec Loss 3.5300 LearningRate 0.0121 Epoch: 18 Global Step: 95310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:10,076-Speed 3430.30 samples/sec Loss 3.5110 LearningRate 0.0121 Epoch: 18 Global Step: 95320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:13,119-Speed 3365.77 samples/sec Loss 3.6297 LearningRate 0.0121 Epoch: 18 Global Step: 95330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:16,098-Speed 3438.13 samples/sec Loss 3.7787 LearningRate 0.0121 Epoch: 18 Global Step: 95340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-20 02:46:19,084-Speed 3431.19 samples/sec Loss 3.5885 LearningRate 0.0121 Epoch: 18 Global Step: 95350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:22,125-Speed 3368.29 samples/sec Loss 3.6733 LearningRate 0.0121 Epoch: 18 Global Step: 95360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:25,118-Speed 3421.61 samples/sec Loss 3.5138 LearningRate 0.0121 Epoch: 18 Global Step: 95370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:28,123-Speed 3408.92 samples/sec Loss 3.6648 LearningRate 0.0121 Epoch: 18 Global Step: 95380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:31,112-Speed 3427.29 samples/sec Loss 3.5933 LearningRate 0.0121 Epoch: 18 Global Step: 95390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:34,149-Speed 3372.22 samples/sec Loss 3.4793 LearningRate 0.0121 Epoch: 18 Global Step: 95400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:46:37,163-Speed 3398.31 samples/sec Loss 3.6172 LearningRate 0.0121 Epoch: 18 Global Step: 95410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:46:40,192-Speed 3381.97 samples/sec Loss 3.5425 LearningRate 0.0120 Epoch: 18 Global Step: 95420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:46:43,289-Speed 3307.43 samples/sec Loss 3.6399 LearningRate 0.0120 Epoch: 18 Global Step: 95430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:46:46,298-Speed 3403.87 samples/sec Loss 3.6759 LearningRate 0.0120 Epoch: 18 Global Step: 95440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:46:49,278-Speed 3436.51 samples/sec Loss 3.8352 LearningRate 0.0120 Epoch: 18 Global Step: 95450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:46:52,341-Speed 3344.75 samples/sec Loss 3.6980 LearningRate 0.0120 Epoch: 18 Global Step: 95460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:46:55,336-Speed 3419.11 samples/sec Loss 3.7043 LearningRate 0.0120 Epoch: 18 Global Step: 95470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:46:58,328-Speed 3423.88 samples/sec Loss 3.6082 LearningRate 0.0120 Epoch: 18 Global Step: 95480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:01,361-Speed 3376.85 samples/sec Loss 3.7068 LearningRate 0.0120 Epoch: 18 Global Step: 95490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:04,346-Speed 3431.94 samples/sec Loss 3.6791 LearningRate 0.0120 Epoch: 18 Global Step: 95500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:07,344-Speed 3416.85 samples/sec Loss 3.6398 LearningRate 0.0120 Epoch: 18 Global Step: 95510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:47:10,476-Speed 3269.81 samples/sec Loss 3.6556 LearningRate 0.0120 Epoch: 18 Global Step: 95520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:47:13,529-Speed 3355.43 samples/sec Loss 3.6521 LearningRate 0.0120 Epoch: 18 Global Step: 95530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:47:16,524-Speed 3419.43 samples/sec Loss 3.7382 LearningRate 0.0120 Epoch: 18 Global Step: 95540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:19,518-Speed 3420.68 samples/sec Loss 3.4780 LearningRate 0.0119 Epoch: 18 Global Step: 95550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:22,512-Speed 3420.99 samples/sec Loss 3.5452 LearningRate 0.0119 Epoch: 18 Global Step: 95560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:25,561-Speed 3359.45 samples/sec Loss 3.5853 LearningRate 0.0119 Epoch: 18 Global Step: 95570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:28,659-Speed 3306.35 samples/sec Loss 3.5539 LearningRate 0.0119 Epoch: 18 Global Step: 95580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:31,684-Speed 3385.95 samples/sec Loss 3.6723 LearningRate 0.0119 Epoch: 18 Global Step: 95590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:34,667-Speed 3434.09 samples/sec Loss 3.5934 LearningRate 0.0119 Epoch: 18 Global Step: 95600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:37,703-Speed 3373.27 samples/sec Loss 3.6460 LearningRate 0.0119 Epoch: 18 Global Step: 95610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:40,836-Speed 3270.01 samples/sec Loss 3.5228 LearningRate 0.0119 Epoch: 18 Global Step: 95620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:43,836-Speed 3414.03 samples/sec Loss 3.4452 LearningRate 0.0119 Epoch: 18 Global Step: 95630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:47:46,827-Speed 3424.08 samples/sec Loss 3.6861 LearningRate 0.0119 Epoch: 18 Global Step: 95640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:47:49,813-Speed 3430.57 samples/sec Loss 3.7212 LearningRate 0.0119 Epoch: 18 Global Step: 95650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:47:52,797-Speed 3432.54 samples/sec Loss 3.5213 LearningRate 0.0119 Epoch: 18 Global Step: 95660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:47:55,833-Speed 3373.65 samples/sec Loss 3.5809 LearningRate 0.0119 Epoch: 18 Global Step: 95670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:47:58,821-Speed 3428.29 samples/sec Loss 3.6347 LearningRate 0.0118 Epoch: 18 Global Step: 95680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:48:01,858-Speed 3372.86 samples/sec Loss 3.6223 LearningRate 0.0118 Epoch: 18 Global Step: 95690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:48:04,853-Speed 3419.97 samples/sec Loss 3.5930 LearningRate 0.0118 Epoch: 18 Global Step: 95700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:48:07,927-Speed 3331.89 samples/sec Loss 3.7139 LearningRate 0.0118 Epoch: 18 Global Step: 95710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:48:10,952-Speed 3385.70 samples/sec Loss 3.5277 LearningRate 0.0118 Epoch: 18 Global Step: 95720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:13,938-Speed 3430.44 samples/sec Loss 3.4739 LearningRate 0.0118 Epoch: 18 Global Step: 95730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:16,926-Speed 3427.87 samples/sec Loss 3.6654 LearningRate 0.0118 Epoch: 18 Global Step: 95740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:19,910-Speed 3433.16 samples/sec Loss 3.7376 LearningRate 0.0118 Epoch: 18 Global Step: 95750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:22,937-Speed 3383.15 samples/sec Loss 3.5979 LearningRate 0.0118 Epoch: 18 Global Step: 95760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:25,924-Speed 3429.76 samples/sec Loss 3.6976 LearningRate 0.0118 Epoch: 18 Global Step: 95770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:28,919-Speed 3420.50 samples/sec Loss 3.5945 LearningRate 0.0118 Epoch: 18 Global Step: 95780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:31,902-Speed 3433.58 samples/sec Loss 3.5624 LearningRate 0.0118 Epoch: 18 Global Step: 95790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:35,022-Speed 3282.61 samples/sec Loss 3.5813 LearningRate 0.0118 Epoch: 18 Global Step: 95800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:38,122-Speed 3303.71 samples/sec Loss 3.6370 LearningRate 0.0117 Epoch: 18 Global Step: 95810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:41,236-Speed 3288.67 samples/sec Loss 3.6537 LearningRate 0.0117 Epoch: 18 Global Step: 95820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:48:44,289-Speed 3356.37 samples/sec Loss 3.6469 LearningRate 0.0117 Epoch: 18 Global Step: 95830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:47,270-Speed 3435.94 samples/sec Loss 3.5227 LearningRate 0.0117 Epoch: 18 Global Step: 95840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:50,264-Speed 3421.79 samples/sec Loss 3.5685 LearningRate 0.0117 Epoch: 18 Global Step: 95850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:53,256-Speed 3422.51 samples/sec Loss 3.5391 LearningRate 0.0117 Epoch: 18 Global Step: 95860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:56,277-Speed 3390.87 samples/sec Loss 3.6423 LearningRate 0.0117 Epoch: 18 Global Step: 95870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:48:59,267-Speed 3425.76 samples/sec Loss 3.7392 LearningRate 0.0117 Epoch: 18 Global Step: 95880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:49:02,436-Speed 3231.92 samples/sec Loss 3.6977 LearningRate 0.0117 Epoch: 18 Global Step: 95890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:49:05,437-Speed 3413.59 samples/sec Loss 3.6029 LearningRate 0.0117 Epoch: 18 Global Step: 95900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:49:08,421-Speed 3432.70 samples/sec Loss 3.7066 LearningRate 0.0117 Epoch: 18 Global Step: 95910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:49:11,402-Speed 3435.61 samples/sec Loss 3.6410 LearningRate 0.0117 Epoch: 18 Global Step: 95920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:49:14,384-Speed 3435.26 samples/sec Loss 3.5021 LearningRate 0.0117 Epoch: 18 Global Step: 95930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:49:17,347-Speed 3456.79 samples/sec Loss 3.7559 LearningRate 0.0116 Epoch: 18 Global Step: 95940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:49:20,369-Speed 3389.06 samples/sec Loss 3.6558 LearningRate 0.0116 Epoch: 18 Global Step: 95950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:49:23,503-Speed 3268.21 samples/sec Loss 3.6006 LearningRate 0.0116 Epoch: 18 Global Step: 95960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:49:26,532-Speed 3381.46 samples/sec Loss 3.6370 LearningRate 0.0116 Epoch: 18 Global Step: 95970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:49:29,571-Speed 3370.39 samples/sec Loss 3.5670 LearningRate 0.0116 Epoch: 18 Global Step: 95980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:49:32,623-Speed 3356.08 samples/sec Loss 3.6111 LearningRate 0.0116 Epoch: 18 Global Step: 95990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:49:35,671-Speed 3361.00 samples/sec Loss 3.5177 LearningRate 0.0116 Epoch: 18 Global Step: 96000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:50:18,774-[lfw][96000]XNorm: 23.618117 Training: 2022-01-20 02:50:18,775-[lfw][96000]Accuracy-Flip: 0.99817+-0.00263 Training: 2022-01-20 02:50:18,776-[lfw][96000]Accuracy-Highest: 0.99833 Training: 2022-01-20 02:51:08,554-[cfp_fp][96000]XNorm: 21.777177 Training: 2022-01-20 02:51:08,555-[cfp_fp][96000]Accuracy-Flip: 0.98300+-0.00517 Training: 2022-01-20 02:51:08,555-[cfp_fp][96000]Accuracy-Highest: 0.98429 Training: 2022-01-20 02:51:51,385-[agedb_30][96000]XNorm: 23.527977 Training: 2022-01-20 02:51:51,386-[agedb_30][96000]Accuracy-Flip: 0.98283+-0.00658 Training: 2022-01-20 02:51:51,387-[agedb_30][96000]Accuracy-Highest: 0.98367 Training: 2022-01-20 02:51:54,367-Speed 73.83 samples/sec Loss 3.5678 LearningRate 0.0116 Epoch: 18 Global Step: 96010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:51:57,337-Speed 3448.29 samples/sec Loss 3.4741 LearningRate 0.0116 Epoch: 18 Global Step: 96020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:52:00,309-Speed 3445.86 samples/sec Loss 3.5877 LearningRate 0.0116 Epoch: 18 Global Step: 96030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:52:03,323-Speed 3399.33 samples/sec Loss 3.5396 LearningRate 0.0116 Epoch: 18 Global Step: 96040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:52:06,314-Speed 3424.58 samples/sec Loss 3.6999 LearningRate 0.0116 Epoch: 18 Global Step: 96050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:52:09,274-Speed 3459.58 samples/sec Loss 3.5858 LearningRate 0.0116 Epoch: 18 Global Step: 96060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:52:12,239-Speed 3455.14 samples/sec Loss 3.4841 LearningRate 0.0115 Epoch: 18 Global Step: 96070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:52:15,215-Speed 3441.55 samples/sec Loss 3.5431 LearningRate 0.0115 Epoch: 18 Global Step: 96080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:52:18,208-Speed 3422.51 samples/sec Loss 3.5759 LearningRate 0.0115 Epoch: 18 Global Step: 96090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:52:21,380-Speed 3228.80 samples/sec Loss 3.4899 LearningRate 0.0115 Epoch: 18 Global Step: 96100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:52:35,923-Speed 704.18 samples/sec Loss 2.9834 LearningRate 0.0115 Epoch: 19 Global Step: 96110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:52:39,132-Speed 3191.83 samples/sec Loss 2.8053 LearningRate 0.0115 Epoch: 19 Global Step: 96120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:52:42,119-Speed 3429.40 samples/sec Loss 2.8927 LearningRate 0.0115 Epoch: 19 Global Step: 96130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:52:45,095-Speed 3442.20 samples/sec Loss 2.7391 LearningRate 0.0115 Epoch: 19 Global Step: 96140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:52:48,115-Speed 3391.89 samples/sec Loss 2.7808 LearningRate 0.0115 Epoch: 19 Global Step: 96150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:52:51,103-Speed 3427.65 samples/sec Loss 2.7959 LearningRate 0.0115 Epoch: 19 Global Step: 96160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:52:54,122-Speed 3392.81 samples/sec Loss 2.8624 LearningRate 0.0115 Epoch: 19 Global Step: 96170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:52:57,104-Speed 3434.76 samples/sec Loss 2.9081 LearningRate 0.0115 Epoch: 19 Global Step: 96180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:00,084-Speed 3436.04 samples/sec Loss 2.8555 LearningRate 0.0115 Epoch: 19 Global Step: 96190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:03,146-Speed 3345.65 samples/sec Loss 2.7199 LearningRate 0.0114 Epoch: 19 Global Step: 96200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:06,136-Speed 3425.58 samples/sec Loss 2.6858 LearningRate 0.0114 Epoch: 19 Global Step: 96210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:09,207-Speed 3336.00 samples/sec Loss 2.7887 LearningRate 0.0114 Epoch: 19 Global Step: 96220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:12,210-Speed 3410.32 samples/sec Loss 2.6932 LearningRate 0.0114 Epoch: 19 Global Step: 96230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:15,198-Speed 3428.54 samples/sec Loss 2.7444 LearningRate 0.0114 Epoch: 19 Global Step: 96240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:18,181-Speed 3433.75 samples/sec Loss 2.8522 LearningRate 0.0114 Epoch: 19 Global Step: 96250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:21,166-Speed 3432.05 samples/sec Loss 2.8232 LearningRate 0.0114 Epoch: 19 Global Step: 96260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:24,145-Speed 3437.86 samples/sec Loss 2.8819 LearningRate 0.0114 Epoch: 19 Global Step: 96270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:53:27,132-Speed 3429.14 samples/sec Loss 2.8174 LearningRate 0.0114 Epoch: 19 Global Step: 96280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:53:30,114-Speed 3434.32 samples/sec Loss 2.7617 LearningRate 0.0114 Epoch: 19 Global Step: 96290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:53:33,096-Speed 3436.16 samples/sec Loss 2.8171 LearningRate 0.0114 Epoch: 19 Global Step: 96300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:53:36,115-Speed 3392.44 samples/sec Loss 2.8297 LearningRate 0.0114 Epoch: 19 Global Step: 96310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:53:39,100-Speed 3432.21 samples/sec Loss 2.8815 LearningRate 0.0114 Epoch: 19 Global Step: 96320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:42,121-Speed 3389.83 samples/sec Loss 2.8239 LearningRate 0.0113 Epoch: 19 Global Step: 96330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:45,126-Speed 3408.61 samples/sec Loss 2.8236 LearningRate 0.0113 Epoch: 19 Global Step: 96340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:48,121-Speed 3420.15 samples/sec Loss 2.9001 LearningRate 0.0113 Epoch: 19 Global Step: 96350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:51,111-Speed 3426.00 samples/sec Loss 2.8593 LearningRate 0.0113 Epoch: 19 Global Step: 96360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:54,126-Speed 3397.58 samples/sec Loss 2.8542 LearningRate 0.0113 Epoch: 19 Global Step: 96370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:53:57,102-Speed 3441.46 samples/sec Loss 2.8371 LearningRate 0.0113 Epoch: 19 Global Step: 96380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:54:00,109-Speed 3406.36 samples/sec Loss 2.7994 LearningRate 0.0113 Epoch: 19 Global Step: 96390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:54:03,102-Speed 3422.71 samples/sec Loss 2.9743 LearningRate 0.0113 Epoch: 19 Global Step: 96400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:54:06,075-Speed 3444.66 samples/sec Loss 2.8587 LearningRate 0.0113 Epoch: 19 Global Step: 96410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:54:09,047-Speed 3446.51 samples/sec Loss 2.9501 LearningRate 0.0113 Epoch: 19 Global Step: 96420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:54:12,049-Speed 3412.06 samples/sec Loss 2.9225 LearningRate 0.0113 Epoch: 19 Global Step: 96430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:54:15,027-Speed 3440.03 samples/sec Loss 2.8961 LearningRate 0.0113 Epoch: 19 Global Step: 96440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:54:18,006-Speed 3439.73 samples/sec Loss 2.8589 LearningRate 0.0113 Epoch: 19 Global Step: 96450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:54:20,993-Speed 3429.24 samples/sec Loss 2.8280 LearningRate 0.0112 Epoch: 19 Global Step: 96460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:54:23,980-Speed 3429.02 samples/sec Loss 2.9946 LearningRate 0.0112 Epoch: 19 Global Step: 96470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:54:26,960-Speed 3437.37 samples/sec Loss 3.0000 LearningRate 0.0112 Epoch: 19 Global Step: 96480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:54:29,935-Speed 3442.64 samples/sec Loss 2.8322 LearningRate 0.0112 Epoch: 19 Global Step: 96490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:54:32,912-Speed 3440.31 samples/sec Loss 2.8613 LearningRate 0.0112 Epoch: 19 Global Step: 96500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:54:35,891-Speed 3438.93 samples/sec Loss 2.9116 LearningRate 0.0112 Epoch: 19 Global Step: 96510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:54:38,886-Speed 3420.15 samples/sec Loss 2.8472 LearningRate 0.0112 Epoch: 19 Global Step: 96520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:54:41,859-Speed 3444.46 samples/sec Loss 3.0782 LearningRate 0.0112 Epoch: 19 Global Step: 96530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:54:44,843-Speed 3432.50 samples/sec Loss 2.8779 LearningRate 0.0112 Epoch: 19 Global Step: 96540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:54:47,832-Speed 3426.81 samples/sec Loss 3.0469 LearningRate 0.0112 Epoch: 19 Global Step: 96550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:54:50,828-Speed 3419.07 samples/sec Loss 3.0836 LearningRate 0.0112 Epoch: 19 Global Step: 96560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:54:53,819-Speed 3423.99 samples/sec Loss 2.9640 LearningRate 0.0112 Epoch: 19 Global Step: 96570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:54:56,804-Speed 3432.32 samples/sec Loss 2.8826 LearningRate 0.0112 Epoch: 19 Global Step: 96580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:54:59,784-Speed 3436.18 samples/sec Loss 2.9911 LearningRate 0.0112 Epoch: 19 Global Step: 96590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:02,764-Speed 3437.96 samples/sec Loss 2.8522 LearningRate 0.0111 Epoch: 19 Global Step: 96600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:05,739-Speed 3442.89 samples/sec Loss 3.1358 LearningRate 0.0111 Epoch: 19 Global Step: 96610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:08,726-Speed 3429.40 samples/sec Loss 2.9512 LearningRate 0.0111 Epoch: 19 Global Step: 96620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:11,743-Speed 3394.52 samples/sec Loss 2.8934 LearningRate 0.0111 Epoch: 19 Global Step: 96630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:14,727-Speed 3432.55 samples/sec Loss 3.0002 LearningRate 0.0111 Epoch: 19 Global Step: 96640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:17,713-Speed 3430.35 samples/sec Loss 2.9324 LearningRate 0.0111 Epoch: 19 Global Step: 96650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:20,837-Speed 3278.53 samples/sec Loss 2.9927 LearningRate 0.0111 Epoch: 19 Global Step: 96660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:24,018-Speed 3220.07 samples/sec Loss 3.0173 LearningRate 0.0111 Epoch: 19 Global Step: 96670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:55:26,997-Speed 3439.04 samples/sec Loss 2.9702 LearningRate 0.0111 Epoch: 19 Global Step: 96680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:55:30,033-Speed 3373.91 samples/sec Loss 3.0043 LearningRate 0.0111 Epoch: 19 Global Step: 96690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:55:33,042-Speed 3406.44 samples/sec Loss 2.8604 LearningRate 0.0111 Epoch: 19 Global Step: 96700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:36,090-Speed 3359.89 samples/sec Loss 3.0622 LearningRate 0.0111 Epoch: 19 Global Step: 96710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:39,125-Speed 3374.54 samples/sec Loss 2.9749 LearningRate 0.0111 Epoch: 19 Global Step: 96720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:42,222-Speed 3307.87 samples/sec Loss 3.0613 LearningRate 0.0110 Epoch: 19 Global Step: 96730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:45,243-Speed 3389.45 samples/sec Loss 2.9443 LearningRate 0.0110 Epoch: 19 Global Step: 96740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:48,219-Speed 3442.06 samples/sec Loss 2.9142 LearningRate 0.0110 Epoch: 19 Global Step: 96750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:51,227-Speed 3405.12 samples/sec Loss 3.1202 LearningRate 0.0110 Epoch: 19 Global Step: 96760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:54,270-Speed 3367.07 samples/sec Loss 2.9095 LearningRate 0.0110 Epoch: 19 Global Step: 96770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:55:57,271-Speed 3413.19 samples/sec Loss 3.1084 LearningRate 0.0110 Epoch: 19 Global Step: 96780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:00,252-Speed 3435.76 samples/sec Loss 3.0051 LearningRate 0.0110 Epoch: 19 Global Step: 96790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:03,219-Speed 3452.61 samples/sec Loss 2.9388 LearningRate 0.0110 Epoch: 19 Global Step: 96800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:06,203-Speed 3432.78 samples/sec Loss 2.9998 LearningRate 0.0110 Epoch: 19 Global Step: 96810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:09,183-Speed 3436.59 samples/sec Loss 3.0343 LearningRate 0.0110 Epoch: 19 Global Step: 96820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:12,263-Speed 3325.44 samples/sec Loss 2.9003 LearningRate 0.0110 Epoch: 19 Global Step: 96830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:15,395-Speed 3270.71 samples/sec Loss 3.0281 LearningRate 0.0110 Epoch: 19 Global Step: 96840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:18,384-Speed 3427.13 samples/sec Loss 3.0860 LearningRate 0.0110 Epoch: 19 Global Step: 96850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:21,410-Speed 3384.24 samples/sec Loss 3.0590 LearningRate 0.0110 Epoch: 19 Global Step: 96860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:24,496-Speed 3319.22 samples/sec Loss 3.0002 LearningRate 0.0109 Epoch: 19 Global Step: 96870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:27,476-Speed 3437.59 samples/sec Loss 2.8910 LearningRate 0.0109 Epoch: 19 Global Step: 96880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:30,515-Speed 3369.49 samples/sec Loss 3.0900 LearningRate 0.0109 Epoch: 19 Global Step: 96890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:33,578-Speed 3344.15 samples/sec Loss 3.0315 LearningRate 0.0109 Epoch: 19 Global Step: 96900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:56:36,593-Speed 3397.46 samples/sec Loss 3.1595 LearningRate 0.0109 Epoch: 19 Global Step: 96910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:56:39,576-Speed 3433.86 samples/sec Loss 3.0911 LearningRate 0.0109 Epoch: 19 Global Step: 96920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:56:42,593-Speed 3394.51 samples/sec Loss 3.0763 LearningRate 0.0109 Epoch: 19 Global Step: 96930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:45,772-Speed 3221.96 samples/sec Loss 3.0937 LearningRate 0.0109 Epoch: 19 Global Step: 96940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:48,858-Speed 3319.49 samples/sec Loss 2.9805 LearningRate 0.0109 Epoch: 19 Global Step: 96950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:51,860-Speed 3412.33 samples/sec Loss 3.0118 LearningRate 0.0109 Epoch: 19 Global Step: 96960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:54,850-Speed 3426.02 samples/sec Loss 3.0291 LearningRate 0.0109 Epoch: 19 Global Step: 96970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:56:57,894-Speed 3363.83 samples/sec Loss 2.9573 LearningRate 0.0109 Epoch: 19 Global Step: 96980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:00,886-Speed 3424.02 samples/sec Loss 3.0667 LearningRate 0.0109 Epoch: 19 Global Step: 96990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:03,896-Speed 3403.87 samples/sec Loss 3.1119 LearningRate 0.0108 Epoch: 19 Global Step: 97000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:06,891-Speed 3419.80 samples/sec Loss 3.0792 LearningRate 0.0108 Epoch: 19 Global Step: 97010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:09,957-Speed 3340.44 samples/sec Loss 2.9350 LearningRate 0.0108 Epoch: 19 Global Step: 97020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:13,081-Speed 3278.46 samples/sec Loss 3.1363 LearningRate 0.0108 Epoch: 19 Global Step: 97030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:57:16,090-Speed 3405.07 samples/sec Loss 3.0824 LearningRate 0.0108 Epoch: 19 Global Step: 97040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:57:19,151-Speed 3346.09 samples/sec Loss 2.9985 LearningRate 0.0108 Epoch: 19 Global Step: 97050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:57:22,141-Speed 3425.89 samples/sec Loss 2.9445 LearningRate 0.0108 Epoch: 19 Global Step: 97060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:25,126-Speed 3430.19 samples/sec Loss 3.1812 LearningRate 0.0108 Epoch: 19 Global Step: 97070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:28,154-Speed 3383.18 samples/sec Loss 3.1448 LearningRate 0.0108 Epoch: 19 Global Step: 97080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:31,137-Speed 3433.34 samples/sec Loss 3.0849 LearningRate 0.0108 Epoch: 19 Global Step: 97090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:34,172-Speed 3375.80 samples/sec Loss 3.0402 LearningRate 0.0108 Epoch: 19 Global Step: 97100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:37,246-Speed 3331.19 samples/sec Loss 3.1924 LearningRate 0.0108 Epoch: 19 Global Step: 97110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:40,237-Speed 3424.66 samples/sec Loss 3.1621 LearningRate 0.0108 Epoch: 19 Global Step: 97120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:43,219-Speed 3434.99 samples/sec Loss 3.1355 LearningRate 0.0108 Epoch: 19 Global Step: 97130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:46,203-Speed 3432.85 samples/sec Loss 3.1328 LearningRate 0.0107 Epoch: 19 Global Step: 97140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:49,191-Speed 3427.47 samples/sec Loss 3.0267 LearningRate 0.0107 Epoch: 19 Global Step: 97150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:57:52,178-Speed 3429.39 samples/sec Loss 3.1057 LearningRate 0.0107 Epoch: 19 Global Step: 97160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:57:55,164-Speed 3430.36 samples/sec Loss 3.0467 LearningRate 0.0107 Epoch: 19 Global Step: 97170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:57:58,175-Speed 3401.48 samples/sec Loss 3.0077 LearningRate 0.0107 Epoch: 19 Global Step: 97180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:58:01,188-Speed 3399.53 samples/sec Loss 3.1485 LearningRate 0.0107 Epoch: 19 Global Step: 97190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:58:04,161-Speed 3445.54 samples/sec Loss 3.1001 LearningRate 0.0107 Epoch: 19 Global Step: 97200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:58:07,203-Speed 3366.74 samples/sec Loss 3.0506 LearningRate 0.0107 Epoch: 19 Global Step: 97210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:58:10,188-Speed 3431.52 samples/sec Loss 3.1023 LearningRate 0.0107 Epoch: 19 Global Step: 97220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:58:13,229-Speed 3369.15 samples/sec Loss 3.0294 LearningRate 0.0107 Epoch: 19 Global Step: 97230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:58:16,357-Speed 3274.88 samples/sec Loss 3.0576 LearningRate 0.0107 Epoch: 19 Global Step: 97240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:58:19,358-Speed 3413.62 samples/sec Loss 2.9977 LearningRate 0.0107 Epoch: 19 Global Step: 97250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:58:22,346-Speed 3427.00 samples/sec Loss 3.0907 LearningRate 0.0107 Epoch: 19 Global Step: 97260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:58:25,327-Speed 3436.57 samples/sec Loss 3.0550 LearningRate 0.0107 Epoch: 19 Global Step: 97270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:58:28,348-Speed 3389.96 samples/sec Loss 3.0070 LearningRate 0.0106 Epoch: 19 Global Step: 97280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:58:31,384-Speed 3374.47 samples/sec Loss 3.1170 LearningRate 0.0106 Epoch: 19 Global Step: 97290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:58:34,365-Speed 3435.94 samples/sec Loss 3.1585 LearningRate 0.0106 Epoch: 19 Global Step: 97300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:58:37,352-Speed 3428.65 samples/sec Loss 3.0586 LearningRate 0.0106 Epoch: 19 Global Step: 97310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:58:40,350-Speed 3416.72 samples/sec Loss 3.0718 LearningRate 0.0106 Epoch: 19 Global Step: 97320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 02:58:43,356-Speed 3408.02 samples/sec Loss 3.1649 LearningRate 0.0106 Epoch: 19 Global Step: 97330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:58:46,365-Speed 3404.14 samples/sec Loss 2.9912 LearningRate 0.0106 Epoch: 19 Global Step: 97340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:58:49,387-Speed 3388.80 samples/sec Loss 3.1609 LearningRate 0.0106 Epoch: 19 Global Step: 97350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:58:52,370-Speed 3433.78 samples/sec Loss 3.1023 LearningRate 0.0106 Epoch: 19 Global Step: 97360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:58:55,353-Speed 3434.06 samples/sec Loss 3.1338 LearningRate 0.0106 Epoch: 19 Global Step: 97370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:58:58,361-Speed 3404.41 samples/sec Loss 3.0957 LearningRate 0.0106 Epoch: 19 Global Step: 97380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:59:01,345-Speed 3432.19 samples/sec Loss 3.0182 LearningRate 0.0106 Epoch: 19 Global Step: 97390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:59:04,331-Speed 3431.16 samples/sec Loss 3.0999 LearningRate 0.0106 Epoch: 19 Global Step: 97400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:59:07,324-Speed 3422.11 samples/sec Loss 3.1006 LearningRate 0.0105 Epoch: 19 Global Step: 97410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:59:10,317-Speed 3421.68 samples/sec Loss 3.0863 LearningRate 0.0105 Epoch: 19 Global Step: 97420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 02:59:13,301-Speed 3434.08 samples/sec Loss 3.0572 LearningRate 0.0105 Epoch: 19 Global Step: 97430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:16,290-Speed 3427.19 samples/sec Loss 3.1081 LearningRate 0.0105 Epoch: 19 Global Step: 97440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:19,288-Speed 3416.09 samples/sec Loss 3.0592 LearningRate 0.0105 Epoch: 19 Global Step: 97450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:22,462-Speed 3227.41 samples/sec Loss 3.0457 LearningRate 0.0105 Epoch: 19 Global Step: 97460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:25,535-Speed 3332.97 samples/sec Loss 3.1596 LearningRate 0.0105 Epoch: 19 Global Step: 97470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:28,516-Speed 3435.34 samples/sec Loss 3.0438 LearningRate 0.0105 Epoch: 19 Global Step: 97480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:31,539-Speed 3389.23 samples/sec Loss 3.0562 LearningRate 0.0105 Epoch: 19 Global Step: 97490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:34,524-Speed 3431.88 samples/sec Loss 3.1373 LearningRate 0.0105 Epoch: 19 Global Step: 97500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:37,595-Speed 3334.65 samples/sec Loss 2.9950 LearningRate 0.0105 Epoch: 19 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:40,592-Speed 3418.35 samples/sec Loss 3.2579 LearningRate 0.0105 Epoch: 19 Global Step: 97520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:43,567-Speed 3442.17 samples/sec Loss 3.1419 LearningRate 0.0105 Epoch: 19 Global Step: 97530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:46,548-Speed 3436.68 samples/sec Loss 2.9478 LearningRate 0.0105 Epoch: 19 Global Step: 97540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:49,571-Speed 3388.65 samples/sec Loss 3.0330 LearningRate 0.0104 Epoch: 19 Global Step: 97550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:52,567-Speed 3418.58 samples/sec Loss 3.0726 LearningRate 0.0104 Epoch: 19 Global Step: 97560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:55,554-Speed 3428.77 samples/sec Loss 3.0558 LearningRate 0.0104 Epoch: 19 Global Step: 97570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 02:59:58,593-Speed 3370.51 samples/sec Loss 3.1602 LearningRate 0.0104 Epoch: 19 Global Step: 97580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:00:01,730-Speed 3265.08 samples/sec Loss 3.0385 LearningRate 0.0104 Epoch: 19 Global Step: 97590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:00:04,725-Speed 3419.93 samples/sec Loss 3.0878 LearningRate 0.0104 Epoch: 19 Global Step: 97600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:00:07,877-Speed 3250.08 samples/sec Loss 3.0705 LearningRate 0.0104 Epoch: 19 Global Step: 97610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:00:10,864-Speed 3429.09 samples/sec Loss 3.0036 LearningRate 0.0104 Epoch: 19 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:00:13,831-Speed 3452.04 samples/sec Loss 3.0501 LearningRate 0.0104 Epoch: 19 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:00:16,823-Speed 3422.59 samples/sec Loss 3.1597 LearningRate 0.0104 Epoch: 19 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:00:19,813-Speed 3425.96 samples/sec Loss 3.0690 LearningRate 0.0104 Epoch: 19 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:00:22,867-Speed 3353.75 samples/sec Loss 3.1359 LearningRate 0.0104 Epoch: 19 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:00:25,917-Speed 3358.74 samples/sec Loss 3.0419 LearningRate 0.0104 Epoch: 19 Global Step: 97670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:00:28,922-Speed 3412.48 samples/sec Loss 3.0125 LearningRate 0.0104 Epoch: 19 Global Step: 97680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:00:31,913-Speed 3424.72 samples/sec Loss 3.0834 LearningRate 0.0103 Epoch: 19 Global Step: 97690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:00:34,909-Speed 3418.54 samples/sec Loss 3.0871 LearningRate 0.0103 Epoch: 19 Global Step: 97700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:00:37,891-Speed 3434.66 samples/sec Loss 3.1153 LearningRate 0.0103 Epoch: 19 Global Step: 97710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:00:40,875-Speed 3432.53 samples/sec Loss 3.1754 LearningRate 0.0103 Epoch: 19 Global Step: 97720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:00:43,886-Speed 3401.88 samples/sec Loss 3.0477 LearningRate 0.0103 Epoch: 19 Global Step: 97730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:00:46,941-Speed 3353.21 samples/sec Loss 3.1078 LearningRate 0.0103 Epoch: 19 Global Step: 97740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:00:49,930-Speed 3426.41 samples/sec Loss 3.1684 LearningRate 0.0103 Epoch: 19 Global Step: 97750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:00:52,970-Speed 3370.22 samples/sec Loss 3.1987 LearningRate 0.0103 Epoch: 19 Global Step: 97760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:00:56,002-Speed 3377.68 samples/sec Loss 3.1983 LearningRate 0.0103 Epoch: 19 Global Step: 97770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:00:59,016-Speed 3398.74 samples/sec Loss 3.1192 LearningRate 0.0103 Epoch: 19 Global Step: 97780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:01:02,042-Speed 3384.65 samples/sec Loss 3.1010 LearningRate 0.0103 Epoch: 19 Global Step: 97790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:05,072-Speed 3380.42 samples/sec Loss 2.9936 LearningRate 0.0103 Epoch: 19 Global Step: 97800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:08,134-Speed 3344.82 samples/sec Loss 3.0370 LearningRate 0.0103 Epoch: 19 Global Step: 97810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:11,165-Speed 3379.41 samples/sec Loss 3.1489 LearningRate 0.0103 Epoch: 19 Global Step: 97820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:14,156-Speed 3424.62 samples/sec Loss 3.1702 LearningRate 0.0102 Epoch: 19 Global Step: 97830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:17,137-Speed 3435.99 samples/sec Loss 3.2512 LearningRate 0.0102 Epoch: 19 Global Step: 97840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:20,120-Speed 3434.03 samples/sec Loss 3.1828 LearningRate 0.0102 Epoch: 19 Global Step: 97850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:23,150-Speed 3380.87 samples/sec Loss 3.1376 LearningRate 0.0102 Epoch: 19 Global Step: 97860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:26,142-Speed 3422.27 samples/sec Loss 3.1150 LearningRate 0.0102 Epoch: 19 Global Step: 97870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:29,126-Speed 3433.03 samples/sec Loss 3.1008 LearningRate 0.0102 Epoch: 19 Global Step: 97880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:32,111-Speed 3431.12 samples/sec Loss 3.0959 LearningRate 0.0102 Epoch: 19 Global Step: 97890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:01:35,094-Speed 3434.86 samples/sec Loss 3.1102 LearningRate 0.0102 Epoch: 19 Global Step: 97900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:01:38,081-Speed 3429.76 samples/sec Loss 3.0063 LearningRate 0.0102 Epoch: 19 Global Step: 97910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:01:41,052-Speed 3446.51 samples/sec Loss 3.0960 LearningRate 0.0102 Epoch: 19 Global Step: 97920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:44,033-Speed 3437.12 samples/sec Loss 3.1983 LearningRate 0.0102 Epoch: 19 Global Step: 97930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:47,019-Speed 3429.39 samples/sec Loss 3.1813 LearningRate 0.0102 Epoch: 19 Global Step: 97940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:50,032-Speed 3400.21 samples/sec Loss 3.1077 LearningRate 0.0102 Epoch: 19 Global Step: 97950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:53,039-Speed 3406.71 samples/sec Loss 3.0880 LearningRate 0.0102 Epoch: 19 Global Step: 97960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:56,019-Speed 3436.51 samples/sec Loss 3.1851 LearningRate 0.0101 Epoch: 19 Global Step: 97970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:01:59,004-Speed 3431.57 samples/sec Loss 3.1179 LearningRate 0.0101 Epoch: 19 Global Step: 97980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:02:02,003-Speed 3415.48 samples/sec Loss 3.1324 LearningRate 0.0101 Epoch: 19 Global Step: 97990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:02:04,988-Speed 3431.56 samples/sec Loss 3.1364 LearningRate 0.0101 Epoch: 19 Global Step: 98000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:02:47,850-[lfw][98000]XNorm: 22.796062 Training: 2022-01-20 03:02:47,851-[lfw][98000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-20 03:02:47,851-[lfw][98000]Accuracy-Highest: 0.99833 Training: 2022-01-20 03:03:37,874-[cfp_fp][98000]XNorm: 21.111165 Training: 2022-01-20 03:03:37,875-[cfp_fp][98000]Accuracy-Flip: 0.98329+-0.00465 Training: 2022-01-20 03:03:37,875-[cfp_fp][98000]Accuracy-Highest: 0.98429 Training: 2022-01-20 03:04:20,997-[agedb_30][98000]XNorm: 22.821802 Training: 2022-01-20 03:04:20,998-[agedb_30][98000]Accuracy-Flip: 0.98433+-0.00554 Training: 2022-01-20 03:04:20,998-[agedb_30][98000]Accuracy-Highest: 0.98433 Training: 2022-01-20 03:04:24,000-Speed 73.66 samples/sec Loss 3.1837 LearningRate 0.0101 Epoch: 19 Global Step: 98010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:04:26,968-Speed 3449.94 samples/sec Loss 3.1889 LearningRate 0.0101 Epoch: 19 Global Step: 98020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:04:29,946-Speed 3440.17 samples/sec Loss 3.2068 LearningRate 0.0101 Epoch: 19 Global Step: 98030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:04:32,979-Speed 3377.32 samples/sec Loss 3.1488 LearningRate 0.0101 Epoch: 19 Global Step: 98040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:04:35,957-Speed 3438.66 samples/sec Loss 2.9575 LearningRate 0.0101 Epoch: 19 Global Step: 98050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:04:38,937-Speed 3437.24 samples/sec Loss 3.1845 LearningRate 0.0101 Epoch: 19 Global Step: 98060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:04:41,951-Speed 3398.31 samples/sec Loss 3.0127 LearningRate 0.0101 Epoch: 19 Global Step: 98070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:04:44,955-Speed 3409.64 samples/sec Loss 3.0018 LearningRate 0.0101 Epoch: 19 Global Step: 98080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:04:47,956-Speed 3412.98 samples/sec Loss 3.1643 LearningRate 0.0101 Epoch: 19 Global Step: 98090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:04:50,963-Speed 3407.29 samples/sec Loss 3.1581 LearningRate 0.0101 Epoch: 19 Global Step: 98100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:04:53,944-Speed 3435.30 samples/sec Loss 3.1317 LearningRate 0.0100 Epoch: 19 Global Step: 98110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:04:56,924-Speed 3437.29 samples/sec Loss 3.2218 LearningRate 0.0100 Epoch: 19 Global Step: 98120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:04:59,906-Speed 3434.45 samples/sec Loss 3.0676 LearningRate 0.0100 Epoch: 19 Global Step: 98130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:02,919-Speed 3400.39 samples/sec Loss 3.0818 LearningRate 0.0100 Epoch: 19 Global Step: 98140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:05,907-Speed 3427.74 samples/sec Loss 3.1718 LearningRate 0.0100 Epoch: 19 Global Step: 98150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:08,913-Speed 3408.09 samples/sec Loss 3.1584 LearningRate 0.0100 Epoch: 19 Global Step: 98160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:11,891-Speed 3438.56 samples/sec Loss 3.1958 LearningRate 0.0100 Epoch: 19 Global Step: 98170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:14,875-Speed 3432.42 samples/sec Loss 3.1124 LearningRate 0.0100 Epoch: 19 Global Step: 98180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:17,869-Speed 3421.80 samples/sec Loss 3.1781 LearningRate 0.0100 Epoch: 19 Global Step: 98190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:05:20,851-Speed 3435.39 samples/sec Loss 3.1203 LearningRate 0.0100 Epoch: 19 Global Step: 98200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:05:23,827-Speed 3442.29 samples/sec Loss 3.1598 LearningRate 0.0100 Epoch: 19 Global Step: 98210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:26,811-Speed 3432.22 samples/sec Loss 3.2729 LearningRate 0.0100 Epoch: 19 Global Step: 98220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:29,823-Speed 3399.90 samples/sec Loss 3.0130 LearningRate 0.0100 Epoch: 19 Global Step: 98230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:32,827-Speed 3410.63 samples/sec Loss 3.1322 LearningRate 0.0100 Epoch: 19 Global Step: 98240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:35,825-Speed 3415.60 samples/sec Loss 3.2545 LearningRate 0.0099 Epoch: 19 Global Step: 98250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:38,852-Speed 3384.62 samples/sec Loss 3.1854 LearningRate 0.0099 Epoch: 19 Global Step: 98260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:41,895-Speed 3365.58 samples/sec Loss 3.1808 LearningRate 0.0099 Epoch: 19 Global Step: 98270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:44,912-Speed 3394.78 samples/sec Loss 3.1313 LearningRate 0.0099 Epoch: 19 Global Step: 98280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:47,918-Speed 3408.39 samples/sec Loss 3.2097 LearningRate 0.0099 Epoch: 19 Global Step: 98290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:50,930-Speed 3400.34 samples/sec Loss 3.1260 LearningRate 0.0099 Epoch: 19 Global Step: 98300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:05:53,933-Speed 3411.19 samples/sec Loss 3.1004 LearningRate 0.0099 Epoch: 19 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:05:56,927-Speed 3420.10 samples/sec Loss 3.1785 LearningRate 0.0099 Epoch: 19 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:05:59,930-Speed 3411.52 samples/sec Loss 3.2301 LearningRate 0.0099 Epoch: 19 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:02,915-Speed 3431.56 samples/sec Loss 3.0940 LearningRate 0.0099 Epoch: 19 Global Step: 98340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:05,931-Speed 3395.39 samples/sec Loss 3.1296 LearningRate 0.0099 Epoch: 19 Global Step: 98350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:08,953-Speed 3389.15 samples/sec Loss 3.0511 LearningRate 0.0099 Epoch: 19 Global Step: 98360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:12,069-Speed 3288.05 samples/sec Loss 3.0352 LearningRate 0.0099 Epoch: 19 Global Step: 98370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:15,118-Speed 3358.87 samples/sec Loss 3.3201 LearningRate 0.0099 Epoch: 19 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:18,127-Speed 3404.72 samples/sec Loss 3.0967 LearningRate 0.0098 Epoch: 19 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:21,118-Speed 3424.39 samples/sec Loss 3.2981 LearningRate 0.0098 Epoch: 19 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:24,117-Speed 3414.75 samples/sec Loss 3.1078 LearningRate 0.0098 Epoch: 19 Global Step: 98410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:27,159-Speed 3367.35 samples/sec Loss 3.1461 LearningRate 0.0098 Epoch: 19 Global Step: 98420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:30,166-Speed 3406.51 samples/sec Loss 3.1180 LearningRate 0.0098 Epoch: 19 Global Step: 98430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:33,195-Speed 3381.37 samples/sec Loss 3.2375 LearningRate 0.0098 Epoch: 19 Global Step: 98440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:36,189-Speed 3420.65 samples/sec Loss 3.1457 LearningRate 0.0098 Epoch: 19 Global Step: 98450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:39,187-Speed 3418.71 samples/sec Loss 3.2374 LearningRate 0.0098 Epoch: 19 Global Step: 98460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:42,238-Speed 3357.19 samples/sec Loss 3.2566 LearningRate 0.0098 Epoch: 19 Global Step: 98470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:45,217-Speed 3438.35 samples/sec Loss 3.1359 LearningRate 0.0098 Epoch: 19 Global Step: 98480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:48,293-Speed 3329.64 samples/sec Loss 3.0306 LearningRate 0.0098 Epoch: 19 Global Step: 98490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:51,271-Speed 3438.93 samples/sec Loss 3.1686 LearningRate 0.0098 Epoch: 19 Global Step: 98500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:54,352-Speed 3324.93 samples/sec Loss 3.2298 LearningRate 0.0098 Epoch: 19 Global Step: 98510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:06:57,454-Speed 3301.35 samples/sec Loss 3.1485 LearningRate 0.0098 Epoch: 19 Global Step: 98520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:07:00,518-Speed 3342.70 samples/sec Loss 3.2923 LearningRate 0.0098 Epoch: 19 Global Step: 98530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:03,516-Speed 3417.16 samples/sec Loss 3.1626 LearningRate 0.0097 Epoch: 19 Global Step: 98540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:06,492-Speed 3442.06 samples/sec Loss 3.1172 LearningRate 0.0097 Epoch: 19 Global Step: 98550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:09,485-Speed 3422.03 samples/sec Loss 3.1394 LearningRate 0.0097 Epoch: 19 Global Step: 98560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:12,472-Speed 3429.09 samples/sec Loss 3.0427 LearningRate 0.0097 Epoch: 19 Global Step: 98570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:15,451-Speed 3438.28 samples/sec Loss 3.3087 LearningRate 0.0097 Epoch: 19 Global Step: 98580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:18,472-Speed 3390.17 samples/sec Loss 3.1006 LearningRate 0.0097 Epoch: 19 Global Step: 98590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:21,450-Speed 3440.03 samples/sec Loss 3.1689 LearningRate 0.0097 Epoch: 19 Global Step: 98600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:24,444-Speed 3420.90 samples/sec Loss 3.1246 LearningRate 0.0097 Epoch: 19 Global Step: 98610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:27,429-Speed 3431.31 samples/sec Loss 3.0874 LearningRate 0.0097 Epoch: 19 Global Step: 98620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:30,408-Speed 3437.63 samples/sec Loss 3.1649 LearningRate 0.0097 Epoch: 19 Global Step: 98630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:07:33,389-Speed 3436.59 samples/sec Loss 3.1725 LearningRate 0.0097 Epoch: 19 Global Step: 98640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:07:36,371-Speed 3435.67 samples/sec Loss 3.2008 LearningRate 0.0097 Epoch: 19 Global Step: 98650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:07:39,359-Speed 3428.02 samples/sec Loss 3.0559 LearningRate 0.0097 Epoch: 19 Global Step: 98660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:07:42,328-Speed 3450.37 samples/sec Loss 3.0991 LearningRate 0.0097 Epoch: 19 Global Step: 98670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:45,305-Speed 3440.69 samples/sec Loss 3.2447 LearningRate 0.0096 Epoch: 19 Global Step: 98680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:48,380-Speed 3330.80 samples/sec Loss 3.1565 LearningRate 0.0096 Epoch: 19 Global Step: 98690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:51,458-Speed 3327.91 samples/sec Loss 3.2194 LearningRate 0.0096 Epoch: 19 Global Step: 98700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:54,445-Speed 3428.31 samples/sec Loss 3.1986 LearningRate 0.0096 Epoch: 19 Global Step: 98710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:07:57,479-Speed 3376.06 samples/sec Loss 3.1513 LearningRate 0.0096 Epoch: 19 Global Step: 98720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:00,470-Speed 3424.39 samples/sec Loss 3.3193 LearningRate 0.0096 Epoch: 19 Global Step: 98730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:03,455-Speed 3431.87 samples/sec Loss 2.9986 LearningRate 0.0096 Epoch: 19 Global Step: 98740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:06,470-Speed 3397.49 samples/sec Loss 3.3133 LearningRate 0.0096 Epoch: 19 Global Step: 98750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:09,490-Speed 3390.90 samples/sec Loss 3.0807 LearningRate 0.0096 Epoch: 19 Global Step: 98760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:12,484-Speed 3421.76 samples/sec Loss 3.2041 LearningRate 0.0096 Epoch: 19 Global Step: 98770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:08:15,449-Speed 3454.78 samples/sec Loss 3.1510 LearningRate 0.0096 Epoch: 19 Global Step: 98780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:18,435-Speed 3430.33 samples/sec Loss 3.1670 LearningRate 0.0096 Epoch: 19 Global Step: 98790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:21,426-Speed 3424.20 samples/sec Loss 3.1284 LearningRate 0.0096 Epoch: 19 Global Step: 98800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:24,422-Speed 3418.08 samples/sec Loss 3.1027 LearningRate 0.0096 Epoch: 19 Global Step: 98810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:27,425-Speed 3412.35 samples/sec Loss 3.1300 LearningRate 0.0095 Epoch: 19 Global Step: 98820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:30,415-Speed 3425.21 samples/sec Loss 3.1778 LearningRate 0.0095 Epoch: 19 Global Step: 98830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:33,390-Speed 3442.86 samples/sec Loss 3.0977 LearningRate 0.0095 Epoch: 19 Global Step: 98840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:36,369-Speed 3438.40 samples/sec Loss 3.2150 LearningRate 0.0095 Epoch: 19 Global Step: 98850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:39,362-Speed 3422.23 samples/sec Loss 3.1248 LearningRate 0.0095 Epoch: 19 Global Step: 98860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:42,346-Speed 3432.04 samples/sec Loss 3.2876 LearningRate 0.0095 Epoch: 19 Global Step: 98870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:08:45,339-Speed 3422.73 samples/sec Loss 3.1781 LearningRate 0.0095 Epoch: 19 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:08:48,370-Speed 3379.51 samples/sec Loss 3.1315 LearningRate 0.0095 Epoch: 19 Global Step: 98890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:08:51,379-Speed 3403.13 samples/sec Loss 3.1290 LearningRate 0.0095 Epoch: 19 Global Step: 98900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:08:54,360-Speed 3436.70 samples/sec Loss 3.1564 LearningRate 0.0095 Epoch: 19 Global Step: 98910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:08:57,353-Speed 3423.11 samples/sec Loss 3.2153 LearningRate 0.0095 Epoch: 19 Global Step: 98920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:09:00,330-Speed 3439.92 samples/sec Loss 3.0828 LearningRate 0.0095 Epoch: 19 Global Step: 98930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:09:03,305-Speed 3443.34 samples/sec Loss 3.1891 LearningRate 0.0095 Epoch: 19 Global Step: 98940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:09:06,316-Speed 3400.77 samples/sec Loss 3.1961 LearningRate 0.0095 Epoch: 19 Global Step: 98950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:09:09,304-Speed 3428.52 samples/sec Loss 3.1681 LearningRate 0.0095 Epoch: 19 Global Step: 98960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:09:12,350-Speed 3362.62 samples/sec Loss 3.1590 LearningRate 0.0094 Epoch: 19 Global Step: 98970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:09:15,336-Speed 3430.49 samples/sec Loss 3.1669 LearningRate 0.0094 Epoch: 19 Global Step: 98980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:09:18,403-Speed 3339.48 samples/sec Loss 3.1900 LearningRate 0.0094 Epoch: 19 Global Step: 98990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:09:21,418-Speed 3397.97 samples/sec Loss 3.2793 LearningRate 0.0094 Epoch: 19 Global Step: 99000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:09:24,399-Speed 3435.86 samples/sec Loss 3.2144 LearningRate 0.0094 Epoch: 19 Global Step: 99010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:09:27,387-Speed 3427.61 samples/sec Loss 3.1523 LearningRate 0.0094 Epoch: 19 Global Step: 99020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:09:30,433-Speed 3362.48 samples/sec Loss 3.1757 LearningRate 0.0094 Epoch: 19 Global Step: 99030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:09:33,487-Speed 3354.71 samples/sec Loss 3.1509 LearningRate 0.0094 Epoch: 19 Global Step: 99040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:09:36,500-Speed 3398.50 samples/sec Loss 3.0807 LearningRate 0.0094 Epoch: 19 Global Step: 99050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:09:39,478-Speed 3440.12 samples/sec Loss 3.2386 LearningRate 0.0094 Epoch: 19 Global Step: 99060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:09:42,458-Speed 3436.36 samples/sec Loss 3.1219 LearningRate 0.0094 Epoch: 19 Global Step: 99070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:09:45,467-Speed 3403.86 samples/sec Loss 3.0878 LearningRate 0.0094 Epoch: 19 Global Step: 99080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:09:48,451-Speed 3432.82 samples/sec Loss 3.1904 LearningRate 0.0094 Epoch: 19 Global Step: 99090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:09:51,431-Speed 3437.69 samples/sec Loss 3.2136 LearningRate 0.0094 Epoch: 19 Global Step: 99100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:09:54,420-Speed 3427.60 samples/sec Loss 3.1316 LearningRate 0.0093 Epoch: 19 Global Step: 99110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:09:57,396-Speed 3441.28 samples/sec Loss 3.0521 LearningRate 0.0093 Epoch: 19 Global Step: 99120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:10:00,439-Speed 3365.27 samples/sec Loss 3.1696 LearningRate 0.0093 Epoch: 19 Global Step: 99130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:10:03,420-Speed 3436.27 samples/sec Loss 3.0069 LearningRate 0.0093 Epoch: 19 Global Step: 99140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:10:06,404-Speed 3433.23 samples/sec Loss 3.0821 LearningRate 0.0093 Epoch: 19 Global Step: 99150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:10:09,398-Speed 3421.02 samples/sec Loss 3.0663 LearningRate 0.0093 Epoch: 19 Global Step: 99160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:10:12,412-Speed 3398.28 samples/sec Loss 3.1727 LearningRate 0.0093 Epoch: 19 Global Step: 99170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:10:15,429-Speed 3394.62 samples/sec Loss 3.2164 LearningRate 0.0093 Epoch: 19 Global Step: 99180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:10:18,419-Speed 3426.36 samples/sec Loss 3.0877 LearningRate 0.0093 Epoch: 19 Global Step: 99190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:10:21,427-Speed 3404.66 samples/sec Loss 3.0648 LearningRate 0.0093 Epoch: 19 Global Step: 99200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:10:24,406-Speed 3438.37 samples/sec Loss 3.0943 LearningRate 0.0093 Epoch: 19 Global Step: 99210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:10:27,388-Speed 3434.70 samples/sec Loss 3.1307 LearningRate 0.0093 Epoch: 19 Global Step: 99220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:10:30,384-Speed 3418.46 samples/sec Loss 3.1500 LearningRate 0.0093 Epoch: 19 Global Step: 99230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:10:33,366-Speed 3435.23 samples/sec Loss 3.0242 LearningRate 0.0093 Epoch: 19 Global Step: 99240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:10:36,442-Speed 3329.84 samples/sec Loss 3.1126 LearningRate 0.0093 Epoch: 19 Global Step: 99250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:10:39,445-Speed 3411.45 samples/sec Loss 3.1384 LearningRate 0.0092 Epoch: 19 Global Step: 99260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:10:42,433-Speed 3427.79 samples/sec Loss 3.1763 LearningRate 0.0092 Epoch: 19 Global Step: 99270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:10:45,415-Speed 3435.14 samples/sec Loss 3.0657 LearningRate 0.0092 Epoch: 19 Global Step: 99280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:10:48,408-Speed 3422.32 samples/sec Loss 3.0750 LearningRate 0.0092 Epoch: 19 Global Step: 99290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:10:51,394-Speed 3429.43 samples/sec Loss 3.1265 LearningRate 0.0092 Epoch: 19 Global Step: 99300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:10:54,375-Speed 3436.87 samples/sec Loss 3.0142 LearningRate 0.0092 Epoch: 19 Global Step: 99310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:10:57,358-Speed 3432.84 samples/sec Loss 3.2295 LearningRate 0.0092 Epoch: 19 Global Step: 99320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:00,345-Speed 3429.95 samples/sec Loss 3.1128 LearningRate 0.0092 Epoch: 19 Global Step: 99330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:03,346-Speed 3412.36 samples/sec Loss 3.2066 LearningRate 0.0092 Epoch: 19 Global Step: 99340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:06,353-Speed 3406.17 samples/sec Loss 3.1627 LearningRate 0.0092 Epoch: 19 Global Step: 99350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:09,386-Speed 3377.09 samples/sec Loss 3.0712 LearningRate 0.0092 Epoch: 19 Global Step: 99360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:12,386-Speed 3415.02 samples/sec Loss 3.1832 LearningRate 0.0092 Epoch: 19 Global Step: 99370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:15,363-Speed 3440.30 samples/sec Loss 3.1145 LearningRate 0.0092 Epoch: 19 Global Step: 99380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:18,372-Speed 3404.63 samples/sec Loss 3.1535 LearningRate 0.0092 Epoch: 19 Global Step: 99390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:11:21,362-Speed 3425.37 samples/sec Loss 3.1326 LearningRate 0.0092 Epoch: 19 Global Step: 99400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:11:24,387-Speed 3386.37 samples/sec Loss 3.1207 LearningRate 0.0091 Epoch: 19 Global Step: 99410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:11:27,443-Speed 3351.50 samples/sec Loss 3.0903 LearningRate 0.0091 Epoch: 19 Global Step: 99420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:30,449-Speed 3407.24 samples/sec Loss 3.0388 LearningRate 0.0091 Epoch: 19 Global Step: 99430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:33,572-Speed 3280.28 samples/sec Loss 3.0557 LearningRate 0.0091 Epoch: 19 Global Step: 99440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:36,648-Speed 3329.95 samples/sec Loss 3.0538 LearningRate 0.0091 Epoch: 19 Global Step: 99450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:39,648-Speed 3414.91 samples/sec Loss 3.2692 LearningRate 0.0091 Epoch: 19 Global Step: 99460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:42,718-Speed 3336.32 samples/sec Loss 3.2331 LearningRate 0.0091 Epoch: 19 Global Step: 99470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:45,759-Speed 3367.95 samples/sec Loss 3.1505 LearningRate 0.0091 Epoch: 19 Global Step: 99480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:48,866-Speed 3296.77 samples/sec Loss 3.1189 LearningRate 0.0091 Epoch: 19 Global Step: 99490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:51,927-Speed 3345.74 samples/sec Loss 3.0722 LearningRate 0.0091 Epoch: 19 Global Step: 99500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:54,907-Speed 3436.65 samples/sec Loss 3.1160 LearningRate 0.0091 Epoch: 19 Global Step: 99510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:11:57,901-Speed 3420.81 samples/sec Loss 3.2785 LearningRate 0.0091 Epoch: 19 Global Step: 99520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:00,894-Speed 3422.15 samples/sec Loss 3.2631 LearningRate 0.0091 Epoch: 19 Global Step: 99530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:03,878-Speed 3433.48 samples/sec Loss 3.1171 LearningRate 0.0091 Epoch: 19 Global Step: 99540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:06,864-Speed 3429.85 samples/sec Loss 3.1239 LearningRate 0.0091 Epoch: 19 Global Step: 99550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:09,925-Speed 3346.18 samples/sec Loss 3.1807 LearningRate 0.0090 Epoch: 19 Global Step: 99560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:12,909-Speed 3432.39 samples/sec Loss 3.1957 LearningRate 0.0090 Epoch: 19 Global Step: 99570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:15,920-Speed 3402.52 samples/sec Loss 3.0524 LearningRate 0.0090 Epoch: 19 Global Step: 99580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:18,988-Speed 3338.55 samples/sec Loss 3.1121 LearningRate 0.0090 Epoch: 19 Global Step: 99590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:22,072-Speed 3321.12 samples/sec Loss 3.1962 LearningRate 0.0090 Epoch: 19 Global Step: 99600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:25,153-Speed 3324.29 samples/sec Loss 3.1136 LearningRate 0.0090 Epoch: 19 Global Step: 99610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:28,155-Speed 3412.03 samples/sec Loss 3.1975 LearningRate 0.0090 Epoch: 19 Global Step: 99620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:12:31,137-Speed 3434.92 samples/sec Loss 3.1647 LearningRate 0.0090 Epoch: 19 Global Step: 99630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:12:34,121-Speed 3433.46 samples/sec Loss 3.1553 LearningRate 0.0090 Epoch: 19 Global Step: 99640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:12:37,142-Speed 3389.93 samples/sec Loss 3.0463 LearningRate 0.0090 Epoch: 19 Global Step: 99650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:40,198-Speed 3351.59 samples/sec Loss 3.1110 LearningRate 0.0090 Epoch: 19 Global Step: 99660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:43,271-Speed 3333.09 samples/sec Loss 3.1904 LearningRate 0.0090 Epoch: 19 Global Step: 99670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:46,268-Speed 3418.01 samples/sec Loss 3.1432 LearningRate 0.0090 Epoch: 19 Global Step: 99680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:49,307-Speed 3369.71 samples/sec Loss 3.1173 LearningRate 0.0090 Epoch: 19 Global Step: 99690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:52,334-Speed 3383.44 samples/sec Loss 3.1131 LearningRate 0.0090 Epoch: 19 Global Step: 99700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:55,325-Speed 3424.76 samples/sec Loss 3.2218 LearningRate 0.0089 Epoch: 19 Global Step: 99710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:12:58,402-Speed 3329.28 samples/sec Loss 3.1791 LearningRate 0.0089 Epoch: 19 Global Step: 99720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:01,473-Speed 3335.22 samples/sec Loss 3.3401 LearningRate 0.0089 Epoch: 19 Global Step: 99730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:04,462-Speed 3427.14 samples/sec Loss 3.1024 LearningRate 0.0089 Epoch: 19 Global Step: 99740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:07,447-Speed 3430.48 samples/sec Loss 3.1062 LearningRate 0.0089 Epoch: 19 Global Step: 99750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:13:10,433-Speed 3430.73 samples/sec Loss 2.9887 LearningRate 0.0089 Epoch: 19 Global Step: 99760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:13,471-Speed 3371.63 samples/sec Loss 3.0784 LearningRate 0.0089 Epoch: 19 Global Step: 99770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:16,469-Speed 3416.63 samples/sec Loss 3.1049 LearningRate 0.0089 Epoch: 19 Global Step: 99780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:19,477-Speed 3404.58 samples/sec Loss 3.1679 LearningRate 0.0089 Epoch: 19 Global Step: 99790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:22,469-Speed 3423.66 samples/sec Loss 3.0908 LearningRate 0.0089 Epoch: 19 Global Step: 99800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:25,566-Speed 3306.64 samples/sec Loss 3.1127 LearningRate 0.0089 Epoch: 19 Global Step: 99810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:28,551-Speed 3431.83 samples/sec Loss 3.1084 LearningRate 0.0089 Epoch: 19 Global Step: 99820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:31,536-Speed 3431.42 samples/sec Loss 3.1037 LearningRate 0.0089 Epoch: 19 Global Step: 99830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:34,534-Speed 3416.93 samples/sec Loss 3.1834 LearningRate 0.0089 Epoch: 19 Global Step: 99840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:37,523-Speed 3426.38 samples/sec Loss 2.9722 LearningRate 0.0089 Epoch: 19 Global Step: 99850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:40,507-Speed 3433.08 samples/sec Loss 3.1541 LearningRate 0.0088 Epoch: 19 Global Step: 99860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:13:43,470-Speed 3456.11 samples/sec Loss 3.1926 LearningRate 0.0088 Epoch: 19 Global Step: 99870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:46,514-Speed 3365.16 samples/sec Loss 3.0045 LearningRate 0.0088 Epoch: 19 Global Step: 99880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:49,513-Speed 3415.28 samples/sec Loss 3.0975 LearningRate 0.0088 Epoch: 19 Global Step: 99890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:52,567-Speed 3353.46 samples/sec Loss 3.1177 LearningRate 0.0088 Epoch: 19 Global Step: 99900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:55,577-Speed 3403.69 samples/sec Loss 3.1332 LearningRate 0.0088 Epoch: 19 Global Step: 99910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:13:58,568-Speed 3424.56 samples/sec Loss 3.2644 LearningRate 0.0088 Epoch: 19 Global Step: 99920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:14:01,603-Speed 3374.08 samples/sec Loss 3.1271 LearningRate 0.0088 Epoch: 19 Global Step: 99930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:14:04,628-Speed 3386.98 samples/sec Loss 3.1061 LearningRate 0.0088 Epoch: 19 Global Step: 99940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:14:07,625-Speed 3417.67 samples/sec Loss 3.1470 LearningRate 0.0088 Epoch: 19 Global Step: 99950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:14:10,639-Speed 3397.54 samples/sec Loss 3.0409 LearningRate 0.0088 Epoch: 19 Global Step: 99960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:14:13,626-Speed 3429.60 samples/sec Loss 3.1731 LearningRate 0.0088 Epoch: 19 Global Step: 99970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:14:16,611-Speed 3430.84 samples/sec Loss 3.1947 LearningRate 0.0088 Epoch: 19 Global Step: 99980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:14:19,608-Speed 3418.24 samples/sec Loss 3.0155 LearningRate 0.0088 Epoch: 19 Global Step: 99990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:14:22,618-Speed 3403.12 samples/sec Loss 3.1903 LearningRate 0.0088 Epoch: 19 Global Step: 100000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:15:05,380-[lfw][100000]XNorm: 22.958105 Training: 2022-01-20 03:15:05,380-[lfw][100000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-01-20 03:15:05,381-[lfw][100000]Accuracy-Highest: 0.99833 Training: 2022-01-20 03:15:54,900-[cfp_fp][100000]XNorm: 21.505291 Training: 2022-01-20 03:15:54,901-[cfp_fp][100000]Accuracy-Flip: 0.98714+-0.00424 Training: 2022-01-20 03:15:54,902-[cfp_fp][100000]Accuracy-Highest: 0.98714 Training: 2022-01-20 03:16:37,573-[agedb_30][100000]XNorm: 22.823724 Training: 2022-01-20 03:16:37,573-[agedb_30][100000]Accuracy-Flip: 0.98133+-0.00614 Training: 2022-01-20 03:16:37,574-[agedb_30][100000]Accuracy-Highest: 0.98433 Training: 2022-01-20 03:16:40,558-Speed 74.24 samples/sec Loss 3.1656 LearningRate 0.0087 Epoch: 19 Global Step: 100010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:16:43,594-Speed 3373.75 samples/sec Loss 3.0618 LearningRate 0.0087 Epoch: 19 Global Step: 100020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:16:46,599-Speed 3407.27 samples/sec Loss 3.2077 LearningRate 0.0087 Epoch: 19 Global Step: 100030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:16:49,627-Speed 3382.82 samples/sec Loss 3.2676 LearningRate 0.0087 Epoch: 19 Global Step: 100040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:16:52,638-Speed 3402.91 samples/sec Loss 3.0621 LearningRate 0.0087 Epoch: 19 Global Step: 100050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:16:55,623-Speed 3430.69 samples/sec Loss 3.0624 LearningRate 0.0087 Epoch: 19 Global Step: 100060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:16:58,604-Speed 3436.20 samples/sec Loss 3.1550 LearningRate 0.0087 Epoch: 19 Global Step: 100070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:01,591-Speed 3428.98 samples/sec Loss 3.0746 LearningRate 0.0087 Epoch: 19 Global Step: 100080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:04,574-Speed 3434.08 samples/sec Loss 3.0864 LearningRate 0.0087 Epoch: 19 Global Step: 100090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:07,551-Speed 3440.55 samples/sec Loss 3.1075 LearningRate 0.0087 Epoch: 19 Global Step: 100100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:10,537-Speed 3429.81 samples/sec Loss 3.1200 LearningRate 0.0087 Epoch: 19 Global Step: 100110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:13,553-Speed 3396.22 samples/sec Loss 3.2024 LearningRate 0.0087 Epoch: 19 Global Step: 100120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:16,535-Speed 3435.25 samples/sec Loss 3.2101 LearningRate 0.0087 Epoch: 19 Global Step: 100130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:19,564-Speed 3382.02 samples/sec Loss 2.9979 LearningRate 0.0087 Epoch: 19 Global Step: 100140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:22,559-Speed 3418.95 samples/sec Loss 3.0931 LearningRate 0.0087 Epoch: 19 Global Step: 100150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:25,571-Speed 3401.25 samples/sec Loss 3.1036 LearningRate 0.0086 Epoch: 19 Global Step: 100160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:28,646-Speed 3330.96 samples/sec Loss 3.0694 LearningRate 0.0086 Epoch: 19 Global Step: 100170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:31,624-Speed 3439.63 samples/sec Loss 3.0510 LearningRate 0.0086 Epoch: 19 Global Step: 100180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:34,599-Speed 3443.13 samples/sec Loss 3.2272 LearningRate 0.0086 Epoch: 19 Global Step: 100190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:37,696-Speed 3306.73 samples/sec Loss 3.1168 LearningRate 0.0086 Epoch: 19 Global Step: 100200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:40,798-Speed 3302.56 samples/sec Loss 3.0782 LearningRate 0.0086 Epoch: 19 Global Step: 100210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:43,849-Speed 3357.10 samples/sec Loss 3.0336 LearningRate 0.0086 Epoch: 19 Global Step: 100220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:46,836-Speed 3429.26 samples/sec Loss 3.1084 LearningRate 0.0086 Epoch: 19 Global Step: 100230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:49,847-Speed 3401.54 samples/sec Loss 3.2023 LearningRate 0.0086 Epoch: 19 Global Step: 100240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:17:52,820-Speed 3445.86 samples/sec Loss 3.0874 LearningRate 0.0086 Epoch: 19 Global Step: 100250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:55,849-Speed 3381.39 samples/sec Loss 3.1149 LearningRate 0.0086 Epoch: 19 Global Step: 100260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:17:58,884-Speed 3375.23 samples/sec Loss 3.0538 LearningRate 0.0086 Epoch: 19 Global Step: 100270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:01,916-Speed 3377.47 samples/sec Loss 3.0220 LearningRate 0.0086 Epoch: 19 Global Step: 100280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:04,999-Speed 3323.04 samples/sec Loss 3.1563 LearningRate 0.0086 Epoch: 19 Global Step: 100290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:08,002-Speed 3410.32 samples/sec Loss 3.1569 LearningRate 0.0086 Epoch: 19 Global Step: 100300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:10,986-Speed 3432.70 samples/sec Loss 3.0313 LearningRate 0.0085 Epoch: 19 Global Step: 100310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:13,996-Speed 3403.38 samples/sec Loss 3.0986 LearningRate 0.0085 Epoch: 19 Global Step: 100320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:17,014-Speed 3393.33 samples/sec Loss 3.1950 LearningRate 0.0085 Epoch: 19 Global Step: 100330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:20,014-Speed 3415.09 samples/sec Loss 3.0611 LearningRate 0.0085 Epoch: 19 Global Step: 100340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:22,995-Speed 3435.34 samples/sec Loss 3.0474 LearningRate 0.0085 Epoch: 19 Global Step: 100350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:18:25,992-Speed 3417.90 samples/sec Loss 3.1743 LearningRate 0.0085 Epoch: 19 Global Step: 100360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:18:29,061-Speed 3337.86 samples/sec Loss 3.1101 LearningRate 0.0085 Epoch: 19 Global Step: 100370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:18:32,208-Speed 3253.77 samples/sec Loss 3.1949 LearningRate 0.0085 Epoch: 19 Global Step: 100380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:35,223-Speed 3397.48 samples/sec Loss 3.1631 LearningRate 0.0085 Epoch: 19 Global Step: 100390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:38,235-Speed 3400.46 samples/sec Loss 3.1050 LearningRate 0.0085 Epoch: 19 Global Step: 100400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:41,217-Speed 3435.17 samples/sec Loss 3.0490 LearningRate 0.0085 Epoch: 19 Global Step: 100410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:44,199-Speed 3434.97 samples/sec Loss 3.0221 LearningRate 0.0085 Epoch: 19 Global Step: 100420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:18:47,162-Speed 3456.16 samples/sec Loss 3.1251 LearningRate 0.0085 Epoch: 19 Global Step: 100430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:18:50,159-Speed 3419.86 samples/sec Loss 3.0746 LearningRate 0.0085 Epoch: 19 Global Step: 100440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:18:53,178-Speed 3392.77 samples/sec Loss 3.1078 LearningRate 0.0085 Epoch: 19 Global Step: 100450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:18:56,160-Speed 3434.64 samples/sec Loss 3.0746 LearningRate 0.0084 Epoch: 19 Global Step: 100460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:18:59,136-Speed 3441.64 samples/sec Loss 3.1901 LearningRate 0.0084 Epoch: 19 Global Step: 100470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:19:02,123-Speed 3428.45 samples/sec Loss 2.9631 LearningRate 0.0084 Epoch: 19 Global Step: 100480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:19:05,123-Speed 3415.22 samples/sec Loss 3.0466 LearningRate 0.0084 Epoch: 19 Global Step: 100490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:19:08,107-Speed 3432.45 samples/sec Loss 3.0392 LearningRate 0.0084 Epoch: 19 Global Step: 100500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:19:11,137-Speed 3380.23 samples/sec Loss 3.1105 LearningRate 0.0084 Epoch: 19 Global Step: 100510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:19:14,136-Speed 3416.10 samples/sec Loss 3.0546 LearningRate 0.0084 Epoch: 19 Global Step: 100520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:19:17,116-Speed 3436.32 samples/sec Loss 2.9680 LearningRate 0.0084 Epoch: 19 Global Step: 100530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:19:20,133-Speed 3396.13 samples/sec Loss 3.1578 LearningRate 0.0084 Epoch: 19 Global Step: 100540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:19:23,114-Speed 3436.04 samples/sec Loss 3.0462 LearningRate 0.0084 Epoch: 19 Global Step: 100550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:19:26,096-Speed 3435.00 samples/sec Loss 3.1127 LearningRate 0.0084 Epoch: 19 Global Step: 100560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:19:29,094-Speed 3416.29 samples/sec Loss 3.0285 LearningRate 0.0084 Epoch: 19 Global Step: 100570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:19:32,077-Speed 3433.73 samples/sec Loss 3.0914 LearningRate 0.0084 Epoch: 19 Global Step: 100580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:19:35,070-Speed 3423.06 samples/sec Loss 3.1509 LearningRate 0.0084 Epoch: 19 Global Step: 100590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:19:38,047-Speed 3439.86 samples/sec Loss 3.0605 LearningRate 0.0084 Epoch: 19 Global Step: 100600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:19:41,049-Speed 3412.18 samples/sec Loss 3.0873 LearningRate 0.0084 Epoch: 19 Global Step: 100610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:19:44,027-Speed 3439.87 samples/sec Loss 3.2319 LearningRate 0.0083 Epoch: 19 Global Step: 100620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:19:47,010-Speed 3433.13 samples/sec Loss 3.1593 LearningRate 0.0083 Epoch: 19 Global Step: 100630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:19:50,021-Speed 3403.10 samples/sec Loss 3.0280 LearningRate 0.0083 Epoch: 19 Global Step: 100640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:19:53,074-Speed 3354.61 samples/sec Loss 3.0704 LearningRate 0.0083 Epoch: 19 Global Step: 100650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:19:56,130-Speed 3351.73 samples/sec Loss 3.0508 LearningRate 0.0083 Epoch: 19 Global Step: 100660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:19:59,117-Speed 3428.49 samples/sec Loss 3.1074 LearningRate 0.0083 Epoch: 19 Global Step: 100670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:20:02,102-Speed 3431.30 samples/sec Loss 3.1100 LearningRate 0.0083 Epoch: 19 Global Step: 100680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:20:05,090-Speed 3429.32 samples/sec Loss 3.0064 LearningRate 0.0083 Epoch: 19 Global Step: 100690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:08,079-Speed 3426.26 samples/sec Loss 3.2241 LearningRate 0.0083 Epoch: 19 Global Step: 100700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:11,066-Speed 3428.49 samples/sec Loss 3.0597 LearningRate 0.0083 Epoch: 19 Global Step: 100710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:14,046-Speed 3437.68 samples/sec Loss 3.2016 LearningRate 0.0083 Epoch: 19 Global Step: 100720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:17,031-Speed 3431.41 samples/sec Loss 3.0314 LearningRate 0.0083 Epoch: 19 Global Step: 100730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:20,152-Speed 3282.41 samples/sec Loss 2.9493 LearningRate 0.0083 Epoch: 19 Global Step: 100740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:23,191-Speed 3369.68 samples/sec Loss 3.0092 LearningRate 0.0083 Epoch: 19 Global Step: 100750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:26,183-Speed 3423.22 samples/sec Loss 3.0628 LearningRate 0.0083 Epoch: 19 Global Step: 100760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:29,171-Speed 3427.92 samples/sec Loss 3.0841 LearningRate 0.0082 Epoch: 19 Global Step: 100770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:32,153-Speed 3435.43 samples/sec Loss 3.1285 LearningRate 0.0082 Epoch: 19 Global Step: 100780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:35,138-Speed 3431.34 samples/sec Loss 3.0514 LearningRate 0.0082 Epoch: 19 Global Step: 100790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:20:38,126-Speed 3428.57 samples/sec Loss 3.0484 LearningRate 0.0082 Epoch: 19 Global Step: 100800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:20:41,129-Speed 3410.69 samples/sec Loss 3.0749 LearningRate 0.0082 Epoch: 19 Global Step: 100810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:20:44,111-Speed 3434.12 samples/sec Loss 3.0017 LearningRate 0.0082 Epoch: 19 Global Step: 100820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:20:47,114-Speed 3410.55 samples/sec Loss 3.1216 LearningRate 0.0082 Epoch: 19 Global Step: 100830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:50,167-Speed 3355.61 samples/sec Loss 3.0392 LearningRate 0.0082 Epoch: 19 Global Step: 100840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:53,166-Speed 3414.92 samples/sec Loss 3.0379 LearningRate 0.0082 Epoch: 19 Global Step: 100850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:56,228-Speed 3344.90 samples/sec Loss 3.0588 LearningRate 0.0082 Epoch: 19 Global Step: 100860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:20:59,209-Speed 3436.19 samples/sec Loss 2.9474 LearningRate 0.0082 Epoch: 19 Global Step: 100870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:02,237-Speed 3383.46 samples/sec Loss 3.0499 LearningRate 0.0082 Epoch: 19 Global Step: 100880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:05,220-Speed 3433.38 samples/sec Loss 3.0814 LearningRate 0.0082 Epoch: 19 Global Step: 100890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:08,276-Speed 3351.44 samples/sec Loss 3.0435 LearningRate 0.0082 Epoch: 19 Global Step: 100900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:11,285-Speed 3404.27 samples/sec Loss 3.0343 LearningRate 0.0082 Epoch: 19 Global Step: 100910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:14,403-Speed 3285.14 samples/sec Loss 3.0787 LearningRate 0.0082 Epoch: 19 Global Step: 100920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:17,380-Speed 3440.81 samples/sec Loss 3.0252 LearningRate 0.0081 Epoch: 19 Global Step: 100930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:20,360-Speed 3437.06 samples/sec Loss 2.9472 LearningRate 0.0081 Epoch: 19 Global Step: 100940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:23,461-Speed 3302.13 samples/sec Loss 3.0435 LearningRate 0.0081 Epoch: 19 Global Step: 100950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:26,618-Speed 3245.15 samples/sec Loss 3.0069 LearningRate 0.0081 Epoch: 19 Global Step: 100960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:29,750-Speed 3327.65 samples/sec Loss 3.0719 LearningRate 0.0081 Epoch: 19 Global Step: 100970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:32,743-Speed 3421.91 samples/sec Loss 3.0369 LearningRate 0.0081 Epoch: 19 Global Step: 100980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:35,772-Speed 3435.61 samples/sec Loss 3.0785 LearningRate 0.0081 Epoch: 19 Global Step: 100990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:21:38,789-Speed 3394.31 samples/sec Loss 3.1025 LearningRate 0.0081 Epoch: 19 Global Step: 101000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:21:41,790-Speed 3413.55 samples/sec Loss 3.0429 LearningRate 0.0081 Epoch: 19 Global Step: 101010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:21:44,852-Speed 3399.64 samples/sec Loss 3.0434 LearningRate 0.0081 Epoch: 19 Global Step: 101020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:21:47,865-Speed 3399.74 samples/sec Loss 3.0196 LearningRate 0.0081 Epoch: 19 Global Step: 101030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:21:50,856-Speed 3440.22 samples/sec Loss 3.0903 LearningRate 0.0081 Epoch: 19 Global Step: 101040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:21:53,833-Speed 3439.97 samples/sec Loss 2.9324 LearningRate 0.0081 Epoch: 19 Global Step: 101050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:21:56,814-Speed 3436.28 samples/sec Loss 3.0351 LearningRate 0.0081 Epoch: 19 Global Step: 101060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:21:59,791-Speed 3442.37 samples/sec Loss 3.0108 LearningRate 0.0081 Epoch: 19 Global Step: 101070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:22:02,787-Speed 3417.71 samples/sec Loss 3.2250 LearningRate 0.0081 Epoch: 19 Global Step: 101080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:22:05,847-Speed 3348.14 samples/sec Loss 3.1270 LearningRate 0.0080 Epoch: 19 Global Step: 101090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:22:08,875-Speed 3383.01 samples/sec Loss 3.0869 LearningRate 0.0080 Epoch: 19 Global Step: 101100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:22:11,854-Speed 3437.20 samples/sec Loss 3.1151 LearningRate 0.0080 Epoch: 19 Global Step: 101110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:22:14,915-Speed 3347.22 samples/sec Loss 3.0006 LearningRate 0.0080 Epoch: 19 Global Step: 101120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:22:17,974-Speed 3348.10 samples/sec Loss 3.0396 LearningRate 0.0080 Epoch: 19 Global Step: 101130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:22:21,109-Speed 3266.66 samples/sec Loss 3.0810 LearningRate 0.0080 Epoch: 19 Global Step: 101140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:22:24,328-Speed 3182.80 samples/sec Loss 3.1081 LearningRate 0.0080 Epoch: 19 Global Step: 101150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:22:27,303-Speed 3442.31 samples/sec Loss 3.0780 LearningRate 0.0080 Epoch: 19 Global Step: 101160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:22:40,268-Speed 789.91 samples/sec Loss 2.3590 LearningRate 0.0080 Epoch: 20 Global Step: 101170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:22:43,340-Speed 3333.77 samples/sec Loss 2.2325 LearningRate 0.0080 Epoch: 20 Global Step: 101180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:22:46,313-Speed 3445.52 samples/sec Loss 2.3322 LearningRate 0.0080 Epoch: 20 Global Step: 101190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:22:49,316-Speed 3411.06 samples/sec Loss 2.2448 LearningRate 0.0080 Epoch: 20 Global Step: 101200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:22:52,334-Speed 3393.86 samples/sec Loss 2.2413 LearningRate 0.0080 Epoch: 20 Global Step: 101210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:22:55,396-Speed 3344.76 samples/sec Loss 2.3507 LearningRate 0.0080 Epoch: 20 Global Step: 101220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:22:58,421-Speed 3386.56 samples/sec Loss 2.2411 LearningRate 0.0080 Epoch: 20 Global Step: 101230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:23:01,579-Speed 3242.85 samples/sec Loss 2.3297 LearningRate 0.0079 Epoch: 20 Global Step: 101240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:23:04,716-Speed 3265.66 samples/sec Loss 2.2706 LearningRate 0.0079 Epoch: 20 Global Step: 101250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:07,741-Speed 3386.27 samples/sec Loss 2.1830 LearningRate 0.0079 Epoch: 20 Global Step: 101260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:10,775-Speed 3375.33 samples/sec Loss 2.3833 LearningRate 0.0079 Epoch: 20 Global Step: 101270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:13,843-Speed 3339.12 samples/sec Loss 2.3445 LearningRate 0.0079 Epoch: 20 Global Step: 101280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:16,837-Speed 3420.99 samples/sec Loss 2.3327 LearningRate 0.0079 Epoch: 20 Global Step: 101290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:19,839-Speed 3412.22 samples/sec Loss 2.3931 LearningRate 0.0079 Epoch: 20 Global Step: 101300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:22,921-Speed 3322.89 samples/sec Loss 2.3425 LearningRate 0.0079 Epoch: 20 Global Step: 101310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:25,934-Speed 3399.89 samples/sec Loss 2.2410 LearningRate 0.0079 Epoch: 20 Global Step: 101320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:28,981-Speed 3362.12 samples/sec Loss 2.3804 LearningRate 0.0079 Epoch: 20 Global Step: 101330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:32,060-Speed 3325.48 samples/sec Loss 2.2736 LearningRate 0.0079 Epoch: 20 Global Step: 101340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:35,061-Speed 3414.75 samples/sec Loss 2.2618 LearningRate 0.0079 Epoch: 20 Global Step: 101350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:38,048-Speed 3428.97 samples/sec Loss 2.2621 LearningRate 0.0079 Epoch: 20 Global Step: 101360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:41,031-Speed 3434.37 samples/sec Loss 2.2921 LearningRate 0.0079 Epoch: 20 Global Step: 101370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:44,098-Speed 3339.27 samples/sec Loss 2.3139 LearningRate 0.0079 Epoch: 20 Global Step: 101380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:47,092-Speed 3420.90 samples/sec Loss 2.3424 LearningRate 0.0079 Epoch: 20 Global Step: 101390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:50,105-Speed 3399.13 samples/sec Loss 2.3201 LearningRate 0.0078 Epoch: 20 Global Step: 101400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:53,128-Speed 3389.15 samples/sec Loss 2.3578 LearningRate 0.0078 Epoch: 20 Global Step: 101410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:56,110-Speed 3435.19 samples/sec Loss 2.2186 LearningRate 0.0078 Epoch: 20 Global Step: 101420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:23:59,094-Speed 3431.28 samples/sec Loss 2.2817 LearningRate 0.0078 Epoch: 20 Global Step: 101430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:02,103-Speed 3404.63 samples/sec Loss 2.2960 LearningRate 0.0078 Epoch: 20 Global Step: 101440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:05,097-Speed 3420.85 samples/sec Loss 2.3820 LearningRate 0.0078 Epoch: 20 Global Step: 101450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:08,085-Speed 3427.61 samples/sec Loss 2.4936 LearningRate 0.0078 Epoch: 20 Global Step: 101460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:11,079-Speed 3422.09 samples/sec Loss 2.4549 LearningRate 0.0078 Epoch: 20 Global Step: 101470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:14,070-Speed 3424.25 samples/sec Loss 2.4534 LearningRate 0.0078 Epoch: 20 Global Step: 101480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:17,065-Speed 3419.63 samples/sec Loss 2.3406 LearningRate 0.0078 Epoch: 20 Global Step: 101490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:20,056-Speed 3425.35 samples/sec Loss 2.4592 LearningRate 0.0078 Epoch: 20 Global Step: 101500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:23,054-Speed 3416.40 samples/sec Loss 2.3933 LearningRate 0.0078 Epoch: 20 Global Step: 101510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:26,080-Speed 3385.33 samples/sec Loss 2.3925 LearningRate 0.0078 Epoch: 20 Global Step: 101520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:29,068-Speed 3427.64 samples/sec Loss 2.3965 LearningRate 0.0078 Epoch: 20 Global Step: 101530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:32,097-Speed 3381.32 samples/sec Loss 2.4633 LearningRate 0.0078 Epoch: 20 Global Step: 101540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:35,091-Speed 3420.85 samples/sec Loss 2.3455 LearningRate 0.0078 Epoch: 20 Global Step: 101550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:38,090-Speed 3415.33 samples/sec Loss 2.3389 LearningRate 0.0077 Epoch: 20 Global Step: 101560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:24:41,056-Speed 3452.82 samples/sec Loss 2.4230 LearningRate 0.0077 Epoch: 20 Global Step: 101570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:24:44,041-Speed 3431.81 samples/sec Loss 2.4146 LearningRate 0.0077 Epoch: 20 Global Step: 101580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:24:47,100-Speed 3348.01 samples/sec Loss 2.3807 LearningRate 0.0077 Epoch: 20 Global Step: 101590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:24:50,089-Speed 3427.50 samples/sec Loss 2.3416 LearningRate 0.0077 Epoch: 20 Global Step: 101600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:24:53,117-Speed 3382.13 samples/sec Loss 2.4878 LearningRate 0.0077 Epoch: 20 Global Step: 101610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:24:56,117-Speed 3415.37 samples/sec Loss 2.3398 LearningRate 0.0077 Epoch: 20 Global Step: 101620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:24:59,123-Speed 3407.43 samples/sec Loss 2.3811 LearningRate 0.0077 Epoch: 20 Global Step: 101630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:25:02,221-Speed 3305.80 samples/sec Loss 2.4835 LearningRate 0.0077 Epoch: 20 Global Step: 101640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:25:05,354-Speed 3269.02 samples/sec Loss 2.4518 LearningRate 0.0077 Epoch: 20 Global Step: 101650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:25:08,494-Speed 3261.97 samples/sec Loss 2.4239 LearningRate 0.0077 Epoch: 20 Global Step: 101660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-20 03:25:11,520-Speed 3385.23 samples/sec Loss 2.4674 LearningRate 0.0077 Epoch: 20 Global Step: 101670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:14,512-Speed 3424.02 samples/sec Loss 2.3298 LearningRate 0.0077 Epoch: 20 Global Step: 101680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:17,657-Speed 3257.10 samples/sec Loss 2.3349 LearningRate 0.0077 Epoch: 20 Global Step: 101690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:20,703-Speed 3362.55 samples/sec Loss 2.4526 LearningRate 0.0077 Epoch: 20 Global Step: 101700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:23,723-Speed 3391.38 samples/sec Loss 2.4253 LearningRate 0.0077 Epoch: 20 Global Step: 101710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:26,727-Speed 3409.42 samples/sec Loss 2.3951 LearningRate 0.0076 Epoch: 20 Global Step: 101720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:29,727-Speed 3414.55 samples/sec Loss 2.4778 LearningRate 0.0076 Epoch: 20 Global Step: 101730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:32,729-Speed 3411.32 samples/sec Loss 2.4268 LearningRate 0.0076 Epoch: 20 Global Step: 101740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:35,785-Speed 3351.79 samples/sec Loss 2.4192 LearningRate 0.0076 Epoch: 20 Global Step: 101750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:38,789-Speed 3410.80 samples/sec Loss 2.3196 LearningRate 0.0076 Epoch: 20 Global Step: 101760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:41,790-Speed 3412.04 samples/sec Loss 2.4585 LearningRate 0.0076 Epoch: 20 Global Step: 101770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:25:44,830-Speed 3369.78 samples/sec Loss 2.3864 LearningRate 0.0076 Epoch: 20 Global Step: 101780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:25:47,798-Speed 3450.70 samples/sec Loss 2.4372 LearningRate 0.0076 Epoch: 20 Global Step: 101790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:50,787-Speed 3427.34 samples/sec Loss 2.3298 LearningRate 0.0076 Epoch: 20 Global Step: 101800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:53,774-Speed 3429.23 samples/sec Loss 2.4102 LearningRate 0.0076 Epoch: 20 Global Step: 101810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:56,756-Speed 3434.25 samples/sec Loss 2.3344 LearningRate 0.0076 Epoch: 20 Global Step: 101820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:25:59,765-Speed 3404.52 samples/sec Loss 2.4261 LearningRate 0.0076 Epoch: 20 Global Step: 101830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:26:02,773-Speed 3404.50 samples/sec Loss 2.4134 LearningRate 0.0076 Epoch: 20 Global Step: 101840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:26:05,787-Speed 3399.04 samples/sec Loss 2.4160 LearningRate 0.0076 Epoch: 20 Global Step: 101850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:26:08,774-Speed 3429.14 samples/sec Loss 2.3945 LearningRate 0.0076 Epoch: 20 Global Step: 101860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:26:11,780-Speed 3407.82 samples/sec Loss 2.3780 LearningRate 0.0076 Epoch: 20 Global Step: 101870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:26:14,790-Speed 3403.08 samples/sec Loss 2.3510 LearningRate 0.0076 Epoch: 20 Global Step: 101880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:26:17,777-Speed 3428.02 samples/sec Loss 2.5426 LearningRate 0.0075 Epoch: 20 Global Step: 101890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-20 03:26:20,751-Speed 3444.94 samples/sec Loss 2.4223 LearningRate 0.0075 Epoch: 20 Global Step: 101900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:26:23,734-Speed 3433.75 samples/sec Loss 2.4724 LearningRate 0.0075 Epoch: 20 Global Step: 101910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:26:26,737-Speed 3409.81 samples/sec Loss 2.6237 LearningRate 0.0075 Epoch: 20 Global Step: 101920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:26:29,731-Speed 3421.92 samples/sec Loss 2.4351 LearningRate 0.0075 Epoch: 20 Global Step: 101930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:26:32,746-Speed 3397.05 samples/sec Loss 2.4568 LearningRate 0.0075 Epoch: 20 Global Step: 101940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:26:35,734-Speed 3428.34 samples/sec Loss 2.3102 LearningRate 0.0075 Epoch: 20 Global Step: 101950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:26:38,724-Speed 3424.74 samples/sec Loss 2.5634 LearningRate 0.0075 Epoch: 20 Global Step: 101960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:26:41,815-Speed 3313.97 samples/sec Loss 2.4053 LearningRate 0.0075 Epoch: 20 Global Step: 101970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:26:44,897-Speed 3324.25 samples/sec Loss 2.4385 LearningRate 0.0075 Epoch: 20 Global Step: 101980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:26:47,879-Speed 3434.25 samples/sec Loss 2.4989 LearningRate 0.0075 Epoch: 20 Global Step: 101990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:26:50,853-Speed 3444.27 samples/sec Loss 2.4482 LearningRate 0.0075 Epoch: 20 Global Step: 102000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:27:34,062-[lfw][102000]XNorm: 22.505029 Training: 2022-01-20 03:27:34,063-[lfw][102000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-20 03:27:34,063-[lfw][102000]Accuracy-Highest: 0.99833 Training: 2022-01-20 03:28:23,990-[cfp_fp][102000]XNorm: 21.187340 Training: 2022-01-20 03:28:23,990-[cfp_fp][102000]Accuracy-Flip: 0.98743+-0.00574 Training: 2022-01-20 03:28:23,991-[cfp_fp][102000]Accuracy-Highest: 0.98743 Training: 2022-01-20 03:29:06,892-[agedb_30][102000]XNorm: 22.582867 Training: 2022-01-20 03:29:06,892-[agedb_30][102000]Accuracy-Flip: 0.98267+-0.00597 Training: 2022-01-20 03:29:06,893-[agedb_30][102000]Accuracy-Highest: 0.98433 Training: 2022-01-20 03:29:09,908-Speed 73.64 samples/sec Loss 2.4529 LearningRate 0.0075 Epoch: 20 Global Step: 102010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:29:12,882-Speed 3443.33 samples/sec Loss 2.4281 LearningRate 0.0075 Epoch: 20 Global Step: 102020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-20 03:29:16,021-Speed 3262.82 samples/sec Loss 2.5044 LearningRate 0.0075 Epoch: 20 Global Step: 102030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:29:19,000-Speed 3438.27 samples/sec Loss 2.3747 LearningRate 0.0075 Epoch: 20 Global Step: 102040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:29:21,996-Speed 3419.27 samples/sec Loss 2.4742 LearningRate 0.0074 Epoch: 20 Global Step: 102050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:29:25,171-Speed 3225.46 samples/sec Loss 2.4580 LearningRate 0.0074 Epoch: 20 Global Step: 102060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:29:28,176-Speed 3409.82 samples/sec Loss 2.4632 LearningRate 0.0074 Epoch: 20 Global Step: 102070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:29:31,179-Speed 3410.86 samples/sec Loss 2.4671 LearningRate 0.0074 Epoch: 20 Global Step: 102080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:29:34,183-Speed 3410.44 samples/sec Loss 2.5039 LearningRate 0.0074 Epoch: 20 Global Step: 102090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:29:37,177-Speed 3420.99 samples/sec Loss 2.5270 LearningRate 0.0074 Epoch: 20 Global Step: 102100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:29:40,158-Speed 3436.12 samples/sec Loss 2.4186 LearningRate 0.0074 Epoch: 20 Global Step: 102110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:29:43,171-Speed 3398.71 samples/sec Loss 2.5531 LearningRate 0.0074 Epoch: 20 Global Step: 102120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:29:46,158-Speed 3429.05 samples/sec Loss 2.3894 LearningRate 0.0074 Epoch: 20 Global Step: 102130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:29:49,140-Speed 3435.67 samples/sec Loss 2.5858 LearningRate 0.0074 Epoch: 20 Global Step: 102140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:29:52,121-Speed 3435.54 samples/sec Loss 2.5472 LearningRate 0.0074 Epoch: 20 Global Step: 102150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:29:55,104-Speed 3434.02 samples/sec Loss 2.4224 LearningRate 0.0074 Epoch: 20 Global Step: 102160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:29:58,416-Speed 3092.81 samples/sec Loss 2.4111 LearningRate 0.0074 Epoch: 20 Global Step: 102170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:30:02,618-Speed 2437.14 samples/sec Loss 2.4203 LearningRate 0.0074 Epoch: 20 Global Step: 102180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:30:05,623-Speed 3408.40 samples/sec Loss 2.4343 LearningRate 0.0074 Epoch: 20 Global Step: 102190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:30:08,619-Speed 3419.40 samples/sec Loss 2.5569 LearningRate 0.0074 Epoch: 20 Global Step: 102200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:30:11,624-Speed 3408.64 samples/sec Loss 2.5295 LearningRate 0.0073 Epoch: 20 Global Step: 102210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:30:14,605-Speed 3436.08 samples/sec Loss 2.4166 LearningRate 0.0073 Epoch: 20 Global Step: 102220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:30:17,621-Speed 3395.06 samples/sec Loss 2.5206 LearningRate 0.0073 Epoch: 20 Global Step: 102230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:30:20,655-Speed 3376.23 samples/sec Loss 2.5797 LearningRate 0.0073 Epoch: 20 Global Step: 102240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:30:23,644-Speed 3427.34 samples/sec Loss 2.3723 LearningRate 0.0073 Epoch: 20 Global Step: 102250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:30:26,657-Speed 3400.13 samples/sec Loss 2.5433 LearningRate 0.0073 Epoch: 20 Global Step: 102260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:30:29,643-Speed 3429.84 samples/sec Loss 2.4547 LearningRate 0.0073 Epoch: 20 Global Step: 102270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:30:32,625-Speed 3434.89 samples/sec Loss 2.5009 LearningRate 0.0073 Epoch: 20 Global Step: 102280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:30:35,660-Speed 3374.67 samples/sec Loss 2.5315 LearningRate 0.0073 Epoch: 20 Global Step: 102290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:30:38,665-Speed 3408.88 samples/sec Loss 2.3829 LearningRate 0.0073 Epoch: 20 Global Step: 102300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:30:41,673-Speed 3404.38 samples/sec Loss 2.4271 LearningRate 0.0073 Epoch: 20 Global Step: 102310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:30:44,663-Speed 3427.03 samples/sec Loss 2.5370 LearningRate 0.0073 Epoch: 20 Global Step: 102320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:30:47,659-Speed 3418.22 samples/sec Loss 2.6104 LearningRate 0.0073 Epoch: 20 Global Step: 102330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:30:50,665-Speed 3407.75 samples/sec Loss 2.5554 LearningRate 0.0073 Epoch: 20 Global Step: 102340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:30:53,686-Speed 3390.19 samples/sec Loss 2.5369 LearningRate 0.0073 Epoch: 20 Global Step: 102350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:30:56,707-Speed 3391.07 samples/sec Loss 2.4532 LearningRate 0.0073 Epoch: 20 Global Step: 102360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:30:59,740-Speed 3376.66 samples/sec Loss 2.4627 LearningRate 0.0073 Epoch: 20 Global Step: 102370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:31:02,754-Speed 3398.06 samples/sec Loss 2.4261 LearningRate 0.0072 Epoch: 20 Global Step: 102380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:31:05,791-Speed 3373.19 samples/sec Loss 2.4554 LearningRate 0.0072 Epoch: 20 Global Step: 102390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:31:08,769-Speed 3438.52 samples/sec Loss 2.5506 LearningRate 0.0072 Epoch: 20 Global Step: 102400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:31:11,754-Speed 3431.41 samples/sec Loss 2.4844 LearningRate 0.0072 Epoch: 20 Global Step: 102410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:31:14,782-Speed 3383.23 samples/sec Loss 2.5645 LearningRate 0.0072 Epoch: 20 Global Step: 102420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:31:17,932-Speed 3251.65 samples/sec Loss 2.5645 LearningRate 0.0072 Epoch: 20 Global Step: 102430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:31:21,057-Speed 3277.48 samples/sec Loss 2.6355 LearningRate 0.0072 Epoch: 20 Global Step: 102440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:31:24,184-Speed 3276.04 samples/sec Loss 2.6092 LearningRate 0.0072 Epoch: 20 Global Step: 102450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:31:27,181-Speed 3417.19 samples/sec Loss 2.5387 LearningRate 0.0072 Epoch: 20 Global Step: 102460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:31:30,163-Speed 3435.31 samples/sec Loss 2.5394 LearningRate 0.0072 Epoch: 20 Global Step: 102470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:31:33,154-Speed 3424.32 samples/sec Loss 2.5293 LearningRate 0.0072 Epoch: 20 Global Step: 102480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:31:36,216-Speed 3344.31 samples/sec Loss 2.5323 LearningRate 0.0072 Epoch: 20 Global Step: 102490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:31:39,238-Speed 3390.85 samples/sec Loss 2.5218 LearningRate 0.0072 Epoch: 20 Global Step: 102500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:31:42,228-Speed 3425.73 samples/sec Loss 2.5335 LearningRate 0.0072 Epoch: 20 Global Step: 102510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:31:45,225-Speed 3417.86 samples/sec Loss 2.4909 LearningRate 0.0072 Epoch: 20 Global Step: 102520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:31:48,212-Speed 3428.85 samples/sec Loss 2.5444 LearningRate 0.0072 Epoch: 20 Global Step: 102530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:31:51,194-Speed 3434.30 samples/sec Loss 2.5453 LearningRate 0.0072 Epoch: 20 Global Step: 102540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:31:54,203-Speed 3404.76 samples/sec Loss 2.5603 LearningRate 0.0071 Epoch: 20 Global Step: 102550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:31:57,179-Speed 3440.96 samples/sec Loss 2.4916 LearningRate 0.0071 Epoch: 20 Global Step: 102560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:00,164-Speed 3431.59 samples/sec Loss 2.4671 LearningRate 0.0071 Epoch: 20 Global Step: 102570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:03,146-Speed 3434.98 samples/sec Loss 2.5678 LearningRate 0.0071 Epoch: 20 Global Step: 102580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:06,122-Speed 3441.92 samples/sec Loss 2.4839 LearningRate 0.0071 Epoch: 20 Global Step: 102590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:09,099-Speed 3440.42 samples/sec Loss 2.6384 LearningRate 0.0071 Epoch: 20 Global Step: 102600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:12,099-Speed 3413.92 samples/sec Loss 2.5814 LearningRate 0.0071 Epoch: 20 Global Step: 102610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:15,082-Speed 3434.38 samples/sec Loss 2.5110 LearningRate 0.0071 Epoch: 20 Global Step: 102620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:32:18,045-Speed 3457.02 samples/sec Loss 2.5725 LearningRate 0.0071 Epoch: 20 Global Step: 102630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:21,039-Speed 3420.75 samples/sec Loss 2.5236 LearningRate 0.0071 Epoch: 20 Global Step: 102640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:24,036-Speed 3418.09 samples/sec Loss 2.4967 LearningRate 0.0071 Epoch: 20 Global Step: 102650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:27,019-Speed 3433.78 samples/sec Loss 2.5973 LearningRate 0.0071 Epoch: 20 Global Step: 102660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:30,003-Speed 3432.64 samples/sec Loss 2.5300 LearningRate 0.0071 Epoch: 20 Global Step: 102670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:32,986-Speed 3433.89 samples/sec Loss 2.4939 LearningRate 0.0071 Epoch: 20 Global Step: 102680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:35,969-Speed 3433.34 samples/sec Loss 2.5479 LearningRate 0.0071 Epoch: 20 Global Step: 102690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:38,950-Speed 3435.78 samples/sec Loss 2.5810 LearningRate 0.0071 Epoch: 20 Global Step: 102700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:41,931-Speed 3436.89 samples/sec Loss 2.6029 LearningRate 0.0070 Epoch: 20 Global Step: 102710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:44,913-Speed 3435.03 samples/sec Loss 2.6535 LearningRate 0.0070 Epoch: 20 Global Step: 102720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:47,916-Speed 3410.86 samples/sec Loss 2.5502 LearningRate 0.0070 Epoch: 20 Global Step: 102730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:32:50,938-Speed 3388.77 samples/sec Loss 2.5367 LearningRate 0.0070 Epoch: 20 Global Step: 102740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:32:53,903-Speed 3454.45 samples/sec Loss 2.4736 LearningRate 0.0070 Epoch: 20 Global Step: 102750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:56,886-Speed 3433.87 samples/sec Loss 2.5781 LearningRate 0.0070 Epoch: 20 Global Step: 102760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:32:59,864-Speed 3439.64 samples/sec Loss 2.5080 LearningRate 0.0070 Epoch: 20 Global Step: 102770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:02,879-Speed 3397.38 samples/sec Loss 2.5713 LearningRate 0.0070 Epoch: 20 Global Step: 102780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:05,881-Speed 3411.06 samples/sec Loss 2.6097 LearningRate 0.0070 Epoch: 20 Global Step: 102790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:08,892-Speed 3402.40 samples/sec Loss 2.6734 LearningRate 0.0070 Epoch: 20 Global Step: 102800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:11,895-Speed 3411.46 samples/sec Loss 2.4872 LearningRate 0.0070 Epoch: 20 Global Step: 102810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:14,883-Speed 3427.86 samples/sec Loss 2.6847 LearningRate 0.0070 Epoch: 20 Global Step: 102820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:17,860-Speed 3441.34 samples/sec Loss 2.4974 LearningRate 0.0070 Epoch: 20 Global Step: 102830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:20,837-Speed 3440.37 samples/sec Loss 2.5902 LearningRate 0.0070 Epoch: 20 Global Step: 102840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:23,817-Speed 3436.60 samples/sec Loss 2.5605 LearningRate 0.0070 Epoch: 20 Global Step: 102850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:33:26,802-Speed 3431.43 samples/sec Loss 2.6759 LearningRate 0.0070 Epoch: 20 Global Step: 102860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:33:29,782-Speed 3437.81 samples/sec Loss 2.5150 LearningRate 0.0070 Epoch: 20 Global Step: 102870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:33:32,750-Speed 3450.62 samples/sec Loss 2.5874 LearningRate 0.0069 Epoch: 20 Global Step: 102880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:35,825-Speed 3330.86 samples/sec Loss 2.4484 LearningRate 0.0069 Epoch: 20 Global Step: 102890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:38,818-Speed 3423.32 samples/sec Loss 2.6220 LearningRate 0.0069 Epoch: 20 Global Step: 102900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:41,899-Speed 3323.66 samples/sec Loss 2.5516 LearningRate 0.0069 Epoch: 20 Global Step: 102910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:44,957-Speed 3350.68 samples/sec Loss 2.6004 LearningRate 0.0069 Epoch: 20 Global Step: 102920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:47,953-Speed 3418.28 samples/sec Loss 2.3764 LearningRate 0.0069 Epoch: 20 Global Step: 102930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:51,062-Speed 3294.17 samples/sec Loss 2.5508 LearningRate 0.0069 Epoch: 20 Global Step: 102940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:54,149-Speed 3318.61 samples/sec Loss 2.5685 LearningRate 0.0069 Epoch: 20 Global Step: 102950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:33:57,163-Speed 3397.85 samples/sec Loss 2.5629 LearningRate 0.0069 Epoch: 20 Global Step: 102960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:00,171-Speed 3405.54 samples/sec Loss 2.5370 LearningRate 0.0069 Epoch: 20 Global Step: 102970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:03,184-Speed 3399.34 samples/sec Loss 2.5721 LearningRate 0.0069 Epoch: 20 Global Step: 102980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:34:06,179-Speed 3420.15 samples/sec Loss 2.6521 LearningRate 0.0069 Epoch: 20 Global Step: 102990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:34:09,161-Speed 3434.58 samples/sec Loss 2.5027 LearningRate 0.0069 Epoch: 20 Global Step: 103000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:34:12,144-Speed 3434.15 samples/sec Loss 2.6266 LearningRate 0.0069 Epoch: 20 Global Step: 103010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:34:15,189-Speed 3363.49 samples/sec Loss 2.5255 LearningRate 0.0069 Epoch: 20 Global Step: 103020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:34:18,199-Speed 3402.40 samples/sec Loss 2.5665 LearningRate 0.0069 Epoch: 20 Global Step: 103030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:34:21,244-Speed 3364.22 samples/sec Loss 2.5135 LearningRate 0.0069 Epoch: 20 Global Step: 103040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:34:24,334-Speed 3315.56 samples/sec Loss 2.5166 LearningRate 0.0068 Epoch: 20 Global Step: 103050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:34:27,325-Speed 3424.94 samples/sec Loss 2.5197 LearningRate 0.0068 Epoch: 20 Global Step: 103060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:30,301-Speed 3441.39 samples/sec Loss 2.6561 LearningRate 0.0068 Epoch: 20 Global Step: 103070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:33,281-Speed 3437.30 samples/sec Loss 2.5557 LearningRate 0.0068 Epoch: 20 Global Step: 103080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:36,345-Speed 3343.54 samples/sec Loss 2.4679 LearningRate 0.0068 Epoch: 20 Global Step: 103090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:39,324-Speed 3438.22 samples/sec Loss 2.6570 LearningRate 0.0068 Epoch: 20 Global Step: 103100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:42,311-Speed 3428.85 samples/sec Loss 2.4956 LearningRate 0.0068 Epoch: 20 Global Step: 103110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:45,299-Speed 3428.40 samples/sec Loss 2.5539 LearningRate 0.0068 Epoch: 20 Global Step: 103120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:48,287-Speed 3427.33 samples/sec Loss 2.5871 LearningRate 0.0068 Epoch: 20 Global Step: 103130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:51,275-Speed 3428.28 samples/sec Loss 2.6407 LearningRate 0.0068 Epoch: 20 Global Step: 103140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:54,304-Speed 3381.78 samples/sec Loss 2.4597 LearningRate 0.0068 Epoch: 20 Global Step: 103150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:34:57,332-Speed 3381.71 samples/sec Loss 2.5659 LearningRate 0.0068 Epoch: 20 Global Step: 103160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:00,371-Speed 3370.77 samples/sec Loss 2.5090 LearningRate 0.0068 Epoch: 20 Global Step: 103170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:03,374-Speed 3411.38 samples/sec Loss 2.5119 LearningRate 0.0068 Epoch: 20 Global Step: 103180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:06,355-Speed 3435.68 samples/sec Loss 2.4915 LearningRate 0.0068 Epoch: 20 Global Step: 103190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:09,447-Speed 3313.16 samples/sec Loss 2.5170 LearningRate 0.0068 Epoch: 20 Global Step: 103200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:12,540-Speed 3311.59 samples/sec Loss 2.4538 LearningRate 0.0068 Epoch: 20 Global Step: 103210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:15,549-Speed 3404.27 samples/sec Loss 2.4794 LearningRate 0.0067 Epoch: 20 Global Step: 103220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:18,565-Speed 3395.24 samples/sec Loss 2.4892 LearningRate 0.0067 Epoch: 20 Global Step: 103230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:21,569-Speed 3410.38 samples/sec Loss 2.5766 LearningRate 0.0067 Epoch: 20 Global Step: 103240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:24,577-Speed 3405.04 samples/sec Loss 2.5401 LearningRate 0.0067 Epoch: 20 Global Step: 103250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:27,565-Speed 3428.57 samples/sec Loss 2.5887 LearningRate 0.0067 Epoch: 20 Global Step: 103260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:35:30,558-Speed 3421.34 samples/sec Loss 2.5463 LearningRate 0.0067 Epoch: 20 Global Step: 103270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:35:33,549-Speed 3424.91 samples/sec Loss 2.5512 LearningRate 0.0067 Epoch: 20 Global Step: 103280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:35:36,559-Speed 3402.61 samples/sec Loss 2.6137 LearningRate 0.0067 Epoch: 20 Global Step: 103290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:35:39,548-Speed 3427.35 samples/sec Loss 2.7466 LearningRate 0.0067 Epoch: 20 Global Step: 103300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:35:42,521-Speed 3444.76 samples/sec Loss 2.6196 LearningRate 0.0067 Epoch: 20 Global Step: 103310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:45,506-Speed 3431.90 samples/sec Loss 2.6338 LearningRate 0.0067 Epoch: 20 Global Step: 103320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:48,607-Speed 3302.61 samples/sec Loss 2.4926 LearningRate 0.0067 Epoch: 20 Global Step: 103330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:51,607-Speed 3413.52 samples/sec Loss 2.5555 LearningRate 0.0067 Epoch: 20 Global Step: 103340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:54,613-Speed 3407.84 samples/sec Loss 2.6220 LearningRate 0.0067 Epoch: 20 Global Step: 103350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:35:57,611-Speed 3416.43 samples/sec Loss 2.6083 LearningRate 0.0067 Epoch: 20 Global Step: 103360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:00,595-Speed 3432.70 samples/sec Loss 2.5777 LearningRate 0.0067 Epoch: 20 Global Step: 103370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:03,626-Speed 3379.29 samples/sec Loss 2.5173 LearningRate 0.0067 Epoch: 20 Global Step: 103380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:06,624-Speed 3417.09 samples/sec Loss 2.5300 LearningRate 0.0067 Epoch: 20 Global Step: 103390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:09,610-Speed 3430.22 samples/sec Loss 2.5438 LearningRate 0.0066 Epoch: 20 Global Step: 103400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:12,597-Speed 3428.66 samples/sec Loss 2.5327 LearningRate 0.0066 Epoch: 20 Global Step: 103410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:36:15,584-Speed 3428.77 samples/sec Loss 2.5308 LearningRate 0.0066 Epoch: 20 Global Step: 103420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:36:18,560-Speed 3442.73 samples/sec Loss 2.5941 LearningRate 0.0066 Epoch: 20 Global Step: 103430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:21,569-Speed 3403.78 samples/sec Loss 2.6344 LearningRate 0.0066 Epoch: 20 Global Step: 103440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:24,554-Speed 3431.69 samples/sec Loss 2.7210 LearningRate 0.0066 Epoch: 20 Global Step: 103450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:27,615-Speed 3345.79 samples/sec Loss 2.4858 LearningRate 0.0066 Epoch: 20 Global Step: 103460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:30,618-Speed 3411.27 samples/sec Loss 2.5812 LearningRate 0.0066 Epoch: 20 Global Step: 103470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:33,766-Speed 3253.90 samples/sec Loss 2.5379 LearningRate 0.0066 Epoch: 20 Global Step: 103480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:36,835-Speed 3336.67 samples/sec Loss 2.5880 LearningRate 0.0066 Epoch: 20 Global Step: 103490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:39,826-Speed 3425.33 samples/sec Loss 2.5244 LearningRate 0.0066 Epoch: 20 Global Step: 103500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:42,864-Speed 3370.63 samples/sec Loss 2.5378 LearningRate 0.0066 Epoch: 20 Global Step: 103510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:45,854-Speed 3425.76 samples/sec Loss 2.6284 LearningRate 0.0066 Epoch: 20 Global Step: 103520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:36:48,858-Speed 3410.89 samples/sec Loss 2.6746 LearningRate 0.0066 Epoch: 20 Global Step: 103530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:36:51,851-Speed 3421.79 samples/sec Loss 2.5336 LearningRate 0.0066 Epoch: 20 Global Step: 103540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:36:54,833-Speed 3435.10 samples/sec Loss 2.4420 LearningRate 0.0066 Epoch: 20 Global Step: 103550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:36:57,833-Speed 3414.27 samples/sec Loss 2.5659 LearningRate 0.0066 Epoch: 20 Global Step: 103560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:37:00,873-Speed 3369.08 samples/sec Loss 2.5042 LearningRate 0.0065 Epoch: 20 Global Step: 103570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:37:03,861-Speed 3428.49 samples/sec Loss 2.6461 LearningRate 0.0065 Epoch: 20 Global Step: 103580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:37:06,920-Speed 3347.82 samples/sec Loss 2.4412 LearningRate 0.0065 Epoch: 20 Global Step: 103590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:37:09,964-Speed 3365.54 samples/sec Loss 2.4673 LearningRate 0.0065 Epoch: 20 Global Step: 103600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:37:12,952-Speed 3428.13 samples/sec Loss 2.4986 LearningRate 0.0065 Epoch: 20 Global Step: 103610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:37:15,932-Speed 3437.05 samples/sec Loss 2.5236 LearningRate 0.0065 Epoch: 20 Global Step: 103620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:37:18,913-Speed 3436.48 samples/sec Loss 2.5336 LearningRate 0.0065 Epoch: 20 Global Step: 103630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:37:21,925-Speed 3400.97 samples/sec Loss 2.6225 LearningRate 0.0065 Epoch: 20 Global Step: 103640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:37:24,911-Speed 3429.41 samples/sec Loss 2.5785 LearningRate 0.0065 Epoch: 20 Global Step: 103650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:37:27,899-Speed 3428.07 samples/sec Loss 2.6594 LearningRate 0.0065 Epoch: 20 Global Step: 103660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:37:30,883-Speed 3432.31 samples/sec Loss 2.5657 LearningRate 0.0065 Epoch: 20 Global Step: 103670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:37:33,870-Speed 3429.17 samples/sec Loss 2.6603 LearningRate 0.0065 Epoch: 20 Global Step: 103680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:37:36,961-Speed 3313.74 samples/sec Loss 2.5129 LearningRate 0.0065 Epoch: 20 Global Step: 103690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:37:40,004-Speed 3366.52 samples/sec Loss 2.5499 LearningRate 0.0065 Epoch: 20 Global Step: 103700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:37:42,987-Speed 3434.14 samples/sec Loss 2.6443 LearningRate 0.0065 Epoch: 20 Global Step: 103710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:37:46,016-Speed 3382.13 samples/sec Loss 2.5358 LearningRate 0.0065 Epoch: 20 Global Step: 103720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:37:49,135-Speed 3284.46 samples/sec Loss 2.5700 LearningRate 0.0065 Epoch: 20 Global Step: 103730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:37:52,148-Speed 3398.89 samples/sec Loss 2.4894 LearningRate 0.0065 Epoch: 20 Global Step: 103740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:37:55,163-Speed 3397.53 samples/sec Loss 2.5924 LearningRate 0.0064 Epoch: 20 Global Step: 103750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:37:58,164-Speed 3413.14 samples/sec Loss 2.6291 LearningRate 0.0064 Epoch: 20 Global Step: 103760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:01,159-Speed 3419.44 samples/sec Loss 2.4520 LearningRate 0.0064 Epoch: 20 Global Step: 103770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:04,219-Speed 3347.90 samples/sec Loss 2.5943 LearningRate 0.0064 Epoch: 20 Global Step: 103780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:07,246-Speed 3383.10 samples/sec Loss 2.6533 LearningRate 0.0064 Epoch: 20 Global Step: 103790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:10,241-Speed 3419.81 samples/sec Loss 2.5981 LearningRate 0.0064 Epoch: 20 Global Step: 103800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:13,235-Speed 3421.82 samples/sec Loss 2.5545 LearningRate 0.0064 Epoch: 20 Global Step: 103810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:16,224-Speed 3426.14 samples/sec Loss 2.5151 LearningRate 0.0064 Epoch: 20 Global Step: 103820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:19,214-Speed 3426.03 samples/sec Loss 2.5062 LearningRate 0.0064 Epoch: 20 Global Step: 103830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:38:22,378-Speed 3237.05 samples/sec Loss 2.5528 LearningRate 0.0064 Epoch: 20 Global Step: 103840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:38:25,371-Speed 3422.33 samples/sec Loss 2.6332 LearningRate 0.0064 Epoch: 20 Global Step: 103850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:38:28,360-Speed 3426.81 samples/sec Loss 2.5335 LearningRate 0.0064 Epoch: 20 Global Step: 103860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:31,350-Speed 3425.82 samples/sec Loss 2.5935 LearningRate 0.0064 Epoch: 20 Global Step: 103870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:34,337-Speed 3429.49 samples/sec Loss 2.6292 LearningRate 0.0064 Epoch: 20 Global Step: 103880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:37,331-Speed 3420.22 samples/sec Loss 2.5535 LearningRate 0.0064 Epoch: 20 Global Step: 103890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:40,317-Speed 3431.47 samples/sec Loss 2.5421 LearningRate 0.0064 Epoch: 20 Global Step: 103900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:43,305-Speed 3428.02 samples/sec Loss 2.4955 LearningRate 0.0064 Epoch: 20 Global Step: 103910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:46,295-Speed 3425.27 samples/sec Loss 2.6402 LearningRate 0.0063 Epoch: 20 Global Step: 103920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:49,391-Speed 3308.46 samples/sec Loss 2.6214 LearningRate 0.0063 Epoch: 20 Global Step: 103930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:52,375-Speed 3432.59 samples/sec Loss 2.6391 LearningRate 0.0063 Epoch: 20 Global Step: 103940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:55,437-Speed 3346.54 samples/sec Loss 2.6809 LearningRate 0.0063 Epoch: 20 Global Step: 103950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:38:58,429-Speed 3422.38 samples/sec Loss 2.5751 LearningRate 0.0063 Epoch: 20 Global Step: 103960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:39:01,419-Speed 3425.49 samples/sec Loss 2.5218 LearningRate 0.0063 Epoch: 20 Global Step: 103970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:39:04,392-Speed 3446.02 samples/sec Loss 2.6042 LearningRate 0.0063 Epoch: 20 Global Step: 103980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:39:07,373-Speed 3435.66 samples/sec Loss 2.6034 LearningRate 0.0063 Epoch: 20 Global Step: 103990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:39:10,360-Speed 3429.62 samples/sec Loss 2.6114 LearningRate 0.0063 Epoch: 20 Global Step: 104000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:39:53,519-[lfw][104000]XNorm: 22.968920 Training: 2022-01-20 03:39:53,520-[lfw][104000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-01-20 03:39:53,520-[lfw][104000]Accuracy-Highest: 0.99833 Training: 2022-01-20 03:40:43,805-[cfp_fp][104000]XNorm: 21.634564 Training: 2022-01-20 03:40:43,806-[cfp_fp][104000]Accuracy-Flip: 0.98771+-0.00520 Training: 2022-01-20 03:40:43,806-[cfp_fp][104000]Accuracy-Highest: 0.98771 Training: 2022-01-20 03:41:26,706-[agedb_30][104000]XNorm: 22.883221 Training: 2022-01-20 03:41:26,707-[agedb_30][104000]Accuracy-Flip: 0.98417+-0.00642 Training: 2022-01-20 03:41:26,707-[agedb_30][104000]Accuracy-Highest: 0.98433 Training: 2022-01-20 03:41:29,701-Speed 73.49 samples/sec Loss 2.6424 LearningRate 0.0063 Epoch: 20 Global Step: 104010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:41:32,677-Speed 3442.29 samples/sec Loss 2.5187 LearningRate 0.0063 Epoch: 20 Global Step: 104020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:41:35,649-Speed 3446.06 samples/sec Loss 2.5674 LearningRate 0.0063 Epoch: 20 Global Step: 104030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:41:38,640-Speed 3424.75 samples/sec Loss 2.6845 LearningRate 0.0063 Epoch: 20 Global Step: 104040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:41:41,707-Speed 3339.61 samples/sec Loss 2.5974 LearningRate 0.0063 Epoch: 20 Global Step: 104050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:41:44,779-Speed 3334.68 samples/sec Loss 2.4828 LearningRate 0.0063 Epoch: 20 Global Step: 104060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:41:47,814-Speed 3375.40 samples/sec Loss 2.6574 LearningRate 0.0063 Epoch: 20 Global Step: 104070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:41:50,859-Speed 3363.69 samples/sec Loss 2.6192 LearningRate 0.0063 Epoch: 20 Global Step: 104080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:41:53,844-Speed 3430.71 samples/sec Loss 2.4473 LearningRate 0.0063 Epoch: 20 Global Step: 104090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:41:56,918-Speed 3332.16 samples/sec Loss 2.6999 LearningRate 0.0062 Epoch: 20 Global Step: 104100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:42:00,029-Speed 3292.52 samples/sec Loss 2.5975 LearningRate 0.0062 Epoch: 20 Global Step: 104110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:42:03,156-Speed 3276.40 samples/sec Loss 2.5696 LearningRate 0.0062 Epoch: 20 Global Step: 104120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:42:06,169-Speed 3398.65 samples/sec Loss 2.4750 LearningRate 0.0062 Epoch: 20 Global Step: 104130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:42:09,188-Speed 3393.47 samples/sec Loss 2.4549 LearningRate 0.0062 Epoch: 20 Global Step: 104140 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-01-20 03:42:12,191-Speed 3411.03 samples/sec Loss 2.5853 LearningRate 0.0062 Epoch: 20 Global Step: 104150 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-01-20 03:42:15,206-Speed 3396.77 samples/sec Loss 2.5647 LearningRate 0.0062 Epoch: 20 Global Step: 104160 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-01-20 03:42:18,221-Speed 3397.79 samples/sec Loss 2.5249 LearningRate 0.0062 Epoch: 20 Global Step: 104170 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-01-20 03:42:21,240-Speed 3391.50 samples/sec Loss 2.5446 LearningRate 0.0062 Epoch: 20 Global Step: 104180 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-01-20 03:42:24,305-Speed 3342.41 samples/sec Loss 2.7057 LearningRate 0.0062 Epoch: 20 Global Step: 104190 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-01-20 03:42:27,328-Speed 3388.04 samples/sec Loss 2.5524 LearningRate 0.0062 Epoch: 20 Global Step: 104200 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-01-20 03:42:30,387-Speed 3349.22 samples/sec Loss 2.6392 LearningRate 0.0062 Epoch: 20 Global Step: 104210 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-01-20 03:42:33,376-Speed 3426.36 samples/sec Loss 2.5382 LearningRate 0.0062 Epoch: 20 Global Step: 104220 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-01-20 03:42:36,379-Speed 3410.77 samples/sec Loss 2.5992 LearningRate 0.0062 Epoch: 20 Global Step: 104230 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-01-20 03:42:39,390-Speed 3402.03 samples/sec Loss 2.6723 LearningRate 0.0062 Epoch: 20 Global Step: 104240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:42:42,374-Speed 3432.38 samples/sec Loss 2.6580 LearningRate 0.0062 Epoch: 20 Global Step: 104250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:42:45,360-Speed 3429.63 samples/sec Loss 2.6609 LearningRate 0.0062 Epoch: 20 Global Step: 104260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:42:48,366-Speed 3407.78 samples/sec Loss 2.5285 LearningRate 0.0062 Epoch: 20 Global Step: 104270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:42:51,347-Speed 3435.70 samples/sec Loss 2.4701 LearningRate 0.0061 Epoch: 20 Global Step: 104280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:42:54,334-Speed 3429.42 samples/sec Loss 2.5643 LearningRate 0.0061 Epoch: 20 Global Step: 104290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:42:57,422-Speed 3317.61 samples/sec Loss 2.6575 LearningRate 0.0061 Epoch: 20 Global Step: 104300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:43:00,524-Speed 3301.86 samples/sec Loss 2.4233 LearningRate 0.0061 Epoch: 20 Global Step: 104310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:43:03,508-Speed 3433.54 samples/sec Loss 2.5813 LearningRate 0.0061 Epoch: 20 Global Step: 104320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:43:06,501-Speed 3422.46 samples/sec Loss 2.5772 LearningRate 0.0061 Epoch: 20 Global Step: 104330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:43:09,578-Speed 3328.92 samples/sec Loss 2.5000 LearningRate 0.0061 Epoch: 20 Global Step: 104340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:12,691-Speed 3290.37 samples/sec Loss 2.6185 LearningRate 0.0061 Epoch: 20 Global Step: 104350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:15,672-Speed 3435.42 samples/sec Loss 2.6264 LearningRate 0.0061 Epoch: 20 Global Step: 104360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:18,693-Speed 3390.98 samples/sec Loss 2.6352 LearningRate 0.0061 Epoch: 20 Global Step: 104370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:21,820-Speed 3275.92 samples/sec Loss 2.4697 LearningRate 0.0061 Epoch: 20 Global Step: 104380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:24,974-Speed 3247.31 samples/sec Loss 2.5575 LearningRate 0.0061 Epoch: 20 Global Step: 104390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:28,008-Speed 3376.94 samples/sec Loss 2.6773 LearningRate 0.0061 Epoch: 20 Global Step: 104400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:31,044-Speed 3372.97 samples/sec Loss 2.6336 LearningRate 0.0061 Epoch: 20 Global Step: 104410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:34,045-Speed 3413.91 samples/sec Loss 2.5566 LearningRate 0.0061 Epoch: 20 Global Step: 104420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:37,029-Speed 3431.82 samples/sec Loss 2.5509 LearningRate 0.0061 Epoch: 20 Global Step: 104430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:40,047-Speed 3393.82 samples/sec Loss 2.6246 LearningRate 0.0061 Epoch: 20 Global Step: 104440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:43:43,049-Speed 3412.75 samples/sec Loss 2.6868 LearningRate 0.0061 Epoch: 20 Global Step: 104450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:43:46,026-Speed 3440.20 samples/sec Loss 2.6115 LearningRate 0.0060 Epoch: 20 Global Step: 104460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:43:48,986-Speed 3459.76 samples/sec Loss 2.5289 LearningRate 0.0060 Epoch: 20 Global Step: 104470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:52,020-Speed 3376.46 samples/sec Loss 2.6451 LearningRate 0.0060 Epoch: 20 Global Step: 104480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:55,130-Speed 3294.08 samples/sec Loss 2.5935 LearningRate 0.0060 Epoch: 20 Global Step: 104490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:43:58,155-Speed 3385.49 samples/sec Loss 2.5207 LearningRate 0.0060 Epoch: 20 Global Step: 104500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:44:01,177-Speed 3389.09 samples/sec Loss 2.4779 LearningRate 0.0060 Epoch: 20 Global Step: 104510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:44:04,161-Speed 3433.48 samples/sec Loss 2.6009 LearningRate 0.0060 Epoch: 20 Global Step: 104520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:44:07,142-Speed 3436.11 samples/sec Loss 2.5855 LearningRate 0.0060 Epoch: 20 Global Step: 104530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:44:10,130-Speed 3427.61 samples/sec Loss 2.7503 LearningRate 0.0060 Epoch: 20 Global Step: 104540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:44:13,112-Speed 3434.45 samples/sec Loss 2.6715 LearningRate 0.0060 Epoch: 20 Global Step: 104550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:44:16,158-Speed 3363.04 samples/sec Loss 2.5548 LearningRate 0.0060 Epoch: 20 Global Step: 104560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:44:19,141-Speed 3434.61 samples/sec Loss 2.4537 LearningRate 0.0060 Epoch: 20 Global Step: 104570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:44:22,129-Speed 3427.47 samples/sec Loss 2.5649 LearningRate 0.0060 Epoch: 20 Global Step: 104580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:44:25,117-Speed 3428.38 samples/sec Loss 2.4930 LearningRate 0.0060 Epoch: 20 Global Step: 104590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:44:28,085-Speed 3451.36 samples/sec Loss 2.5999 LearningRate 0.0060 Epoch: 20 Global Step: 104600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:44:31,045-Speed 3460.01 samples/sec Loss 2.5720 LearningRate 0.0060 Epoch: 20 Global Step: 104610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:44:34,027-Speed 3434.82 samples/sec Loss 2.6650 LearningRate 0.0060 Epoch: 20 Global Step: 104620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:44:37,010-Speed 3433.26 samples/sec Loss 2.7082 LearningRate 0.0060 Epoch: 20 Global Step: 104630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:44:40,022-Speed 3401.50 samples/sec Loss 2.5440 LearningRate 0.0059 Epoch: 20 Global Step: 104640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:44:43,019-Speed 3417.22 samples/sec Loss 2.6443 LearningRate 0.0059 Epoch: 20 Global Step: 104650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:44:46,085-Speed 3340.98 samples/sec Loss 2.6016 LearningRate 0.0059 Epoch: 20 Global Step: 104660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:44:49,108-Speed 3389.25 samples/sec Loss 2.5012 LearningRate 0.0059 Epoch: 20 Global Step: 104670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:44:52,084-Speed 3440.95 samples/sec Loss 2.6165 LearningRate 0.0059 Epoch: 20 Global Step: 104680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:44:55,065-Speed 3436.23 samples/sec Loss 2.5148 LearningRate 0.0059 Epoch: 20 Global Step: 104690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:44:58,044-Speed 3438.51 samples/sec Loss 2.5151 LearningRate 0.0059 Epoch: 20 Global Step: 104700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:45:01,022-Speed 3438.98 samples/sec Loss 2.5536 LearningRate 0.0059 Epoch: 20 Global Step: 104710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:04,025-Speed 3410.85 samples/sec Loss 2.5246 LearningRate 0.0059 Epoch: 20 Global Step: 104720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:07,010-Speed 3431.46 samples/sec Loss 2.6594 LearningRate 0.0059 Epoch: 20 Global Step: 104730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:09,992-Speed 3435.07 samples/sec Loss 2.5359 LearningRate 0.0059 Epoch: 20 Global Step: 104740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:12,974-Speed 3434.44 samples/sec Loss 2.4114 LearningRate 0.0059 Epoch: 20 Global Step: 104750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:15,987-Speed 3400.30 samples/sec Loss 2.5830 LearningRate 0.0059 Epoch: 20 Global Step: 104760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:18,964-Speed 3441.07 samples/sec Loss 2.5561 LearningRate 0.0059 Epoch: 20 Global Step: 104770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:21,958-Speed 3420.97 samples/sec Loss 2.6553 LearningRate 0.0059 Epoch: 20 Global Step: 104780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:25,027-Speed 3337.40 samples/sec Loss 2.7295 LearningRate 0.0059 Epoch: 20 Global Step: 104790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:28,007-Speed 3436.91 samples/sec Loss 2.6017 LearningRate 0.0059 Epoch: 20 Global Step: 104800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:30,987-Speed 3437.32 samples/sec Loss 2.4475 LearningRate 0.0059 Epoch: 20 Global Step: 104810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:45:33,970-Speed 3433.75 samples/sec Loss 2.6238 LearningRate 0.0059 Epoch: 20 Global Step: 104820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:45:36,958-Speed 3428.01 samples/sec Loss 2.5172 LearningRate 0.0058 Epoch: 20 Global Step: 104830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:39,949-Speed 3425.08 samples/sec Loss 2.5834 LearningRate 0.0058 Epoch: 20 Global Step: 104840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:42,937-Speed 3427.10 samples/sec Loss 2.6782 LearningRate 0.0058 Epoch: 20 Global Step: 104850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:45,928-Speed 3424.97 samples/sec Loss 2.5568 LearningRate 0.0058 Epoch: 20 Global Step: 104860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:48,912-Speed 3432.95 samples/sec Loss 2.5776 LearningRate 0.0058 Epoch: 20 Global Step: 104870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:51,898-Speed 3429.63 samples/sec Loss 2.5859 LearningRate 0.0058 Epoch: 20 Global Step: 104880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:54,884-Speed 3430.46 samples/sec Loss 2.6457 LearningRate 0.0058 Epoch: 20 Global Step: 104890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:45:57,883-Speed 3415.42 samples/sec Loss 2.4957 LearningRate 0.0058 Epoch: 20 Global Step: 104900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:46:00,867-Speed 3431.92 samples/sec Loss 2.5676 LearningRate 0.0058 Epoch: 20 Global Step: 104910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:46:03,851-Speed 3433.96 samples/sec Loss 2.5430 LearningRate 0.0058 Epoch: 20 Global Step: 104920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:46:06,837-Speed 3429.87 samples/sec Loss 2.5284 LearningRate 0.0058 Epoch: 20 Global Step: 104930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:46:09,822-Speed 3432.40 samples/sec Loss 2.5418 LearningRate 0.0058 Epoch: 20 Global Step: 104940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:46:12,837-Speed 3396.79 samples/sec Loss 2.5854 LearningRate 0.0058 Epoch: 20 Global Step: 104950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:46:15,814-Speed 3440.54 samples/sec Loss 2.5399 LearningRate 0.0058 Epoch: 20 Global Step: 104960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:46:18,800-Speed 3430.71 samples/sec Loss 2.6171 LearningRate 0.0058 Epoch: 20 Global Step: 104970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:46:21,798-Speed 3416.45 samples/sec Loss 2.6159 LearningRate 0.0058 Epoch: 20 Global Step: 104980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:46:24,854-Speed 3350.78 samples/sec Loss 2.4960 LearningRate 0.0058 Epoch: 20 Global Step: 104990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:46:27,935-Speed 3325.23 samples/sec Loss 2.6182 LearningRate 0.0058 Epoch: 20 Global Step: 105000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:46:30,917-Speed 3434.37 samples/sec Loss 2.5256 LearningRate 0.0057 Epoch: 20 Global Step: 105010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:46:33,926-Speed 3403.50 samples/sec Loss 2.5442 LearningRate 0.0057 Epoch: 20 Global Step: 105020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:46:36,950-Speed 3388.14 samples/sec Loss 2.5880 LearningRate 0.0057 Epoch: 20 Global Step: 105030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:46:39,945-Speed 3420.19 samples/sec Loss 2.5928 LearningRate 0.0057 Epoch: 20 Global Step: 105040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:46:42,926-Speed 3435.08 samples/sec Loss 2.6131 LearningRate 0.0057 Epoch: 20 Global Step: 105050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:46:45,906-Speed 3437.71 samples/sec Loss 2.5283 LearningRate 0.0057 Epoch: 20 Global Step: 105060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:46:48,885-Speed 3438.81 samples/sec Loss 2.6139 LearningRate 0.0057 Epoch: 20 Global Step: 105070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:46:51,888-Speed 3411.11 samples/sec Loss 2.6100 LearningRate 0.0057 Epoch: 20 Global Step: 105080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:46:54,895-Speed 3405.85 samples/sec Loss 2.6709 LearningRate 0.0057 Epoch: 20 Global Step: 105090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:46:57,879-Speed 3432.65 samples/sec Loss 2.5678 LearningRate 0.0057 Epoch: 20 Global Step: 105100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:47:00,865-Speed 3429.63 samples/sec Loss 2.5638 LearningRate 0.0057 Epoch: 20 Global Step: 105110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:47:03,848-Speed 3434.69 samples/sec Loss 2.5412 LearningRate 0.0057 Epoch: 20 Global Step: 105120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:47:06,832-Speed 3432.42 samples/sec Loss 2.5088 LearningRate 0.0057 Epoch: 20 Global Step: 105130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:09,831-Speed 3415.49 samples/sec Loss 2.4497 LearningRate 0.0057 Epoch: 20 Global Step: 105140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:12,818-Speed 3429.37 samples/sec Loss 2.5772 LearningRate 0.0057 Epoch: 20 Global Step: 105150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:15,797-Speed 3438.89 samples/sec Loss 2.6617 LearningRate 0.0057 Epoch: 20 Global Step: 105160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:18,860-Speed 3344.39 samples/sec Loss 2.6514 LearningRate 0.0057 Epoch: 20 Global Step: 105170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:21,850-Speed 3425.39 samples/sec Loss 2.5092 LearningRate 0.0057 Epoch: 20 Global Step: 105180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:24,917-Speed 3339.83 samples/sec Loss 2.5903 LearningRate 0.0057 Epoch: 20 Global Step: 105190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:27,898-Speed 3436.06 samples/sec Loss 2.5801 LearningRate 0.0056 Epoch: 20 Global Step: 105200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:30,885-Speed 3429.89 samples/sec Loss 2.5012 LearningRate 0.0056 Epoch: 20 Global Step: 105210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:33,905-Speed 3391.97 samples/sec Loss 2.5656 LearningRate 0.0056 Epoch: 20 Global Step: 105220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:36,889-Speed 3432.41 samples/sec Loss 2.5627 LearningRate 0.0056 Epoch: 20 Global Step: 105230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:39,906-Speed 3395.76 samples/sec Loss 2.5425 LearningRate 0.0056 Epoch: 20 Global Step: 105240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:42,899-Speed 3421.87 samples/sec Loss 2.5475 LearningRate 0.0056 Epoch: 20 Global Step: 105250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:45,879-Speed 3436.79 samples/sec Loss 2.5313 LearningRate 0.0056 Epoch: 20 Global Step: 105260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:48,880-Speed 3413.13 samples/sec Loss 2.7023 LearningRate 0.0056 Epoch: 20 Global Step: 105270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:51,869-Speed 3426.05 samples/sec Loss 2.4790 LearningRate 0.0056 Epoch: 20 Global Step: 105280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:54,933-Speed 3343.47 samples/sec Loss 2.5752 LearningRate 0.0056 Epoch: 20 Global Step: 105290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:47:57,967-Speed 3376.44 samples/sec Loss 2.6184 LearningRate 0.0056 Epoch: 20 Global Step: 105300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:00,955-Speed 3427.40 samples/sec Loss 2.6014 LearningRate 0.0056 Epoch: 20 Global Step: 105310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:03,949-Speed 3421.45 samples/sec Loss 2.4898 LearningRate 0.0056 Epoch: 20 Global Step: 105320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:06,943-Speed 3420.81 samples/sec Loss 2.6391 LearningRate 0.0056 Epoch: 20 Global Step: 105330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:48:09,966-Speed 3388.85 samples/sec Loss 2.5484 LearningRate 0.0056 Epoch: 20 Global Step: 105340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:48:12,934-Speed 3450.15 samples/sec Loss 2.4559 LearningRate 0.0056 Epoch: 20 Global Step: 105350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:15,922-Speed 3427.67 samples/sec Loss 2.4901 LearningRate 0.0056 Epoch: 20 Global Step: 105360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:18,950-Speed 3383.85 samples/sec Loss 2.6472 LearningRate 0.0056 Epoch: 20 Global Step: 105370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:21,936-Speed 3429.85 samples/sec Loss 2.6019 LearningRate 0.0056 Epoch: 20 Global Step: 105380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:24,972-Speed 3375.69 samples/sec Loss 2.5953 LearningRate 0.0055 Epoch: 20 Global Step: 105390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:28,071-Speed 3304.96 samples/sec Loss 2.6201 LearningRate 0.0055 Epoch: 20 Global Step: 105400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:31,052-Speed 3435.07 samples/sec Loss 2.5178 LearningRate 0.0055 Epoch: 20 Global Step: 105410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:34,035-Speed 3435.73 samples/sec Loss 2.5996 LearningRate 0.0055 Epoch: 20 Global Step: 105420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:37,018-Speed 3433.93 samples/sec Loss 2.6301 LearningRate 0.0055 Epoch: 20 Global Step: 105430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:39,999-Speed 3435.41 samples/sec Loss 2.5207 LearningRate 0.0055 Epoch: 20 Global Step: 105440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:42,989-Speed 3426.22 samples/sec Loss 2.4680 LearningRate 0.0055 Epoch: 20 Global Step: 105450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:48:45,954-Speed 3453.51 samples/sec Loss 2.4665 LearningRate 0.0055 Epoch: 20 Global Step: 105460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:48,990-Speed 3374.44 samples/sec Loss 2.5582 LearningRate 0.0055 Epoch: 20 Global Step: 105470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:52,009-Speed 3393.01 samples/sec Loss 2.5115 LearningRate 0.0055 Epoch: 20 Global Step: 105480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:54,998-Speed 3426.40 samples/sec Loss 2.5066 LearningRate 0.0055 Epoch: 20 Global Step: 105490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:48:57,986-Speed 3428.75 samples/sec Loss 2.5655 LearningRate 0.0055 Epoch: 20 Global Step: 105500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:00,964-Speed 3438.40 samples/sec Loss 2.6104 LearningRate 0.0055 Epoch: 20 Global Step: 105510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:03,957-Speed 3422.60 samples/sec Loss 2.5101 LearningRate 0.0055 Epoch: 20 Global Step: 105520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:06,941-Speed 3432.75 samples/sec Loss 2.4253 LearningRate 0.0055 Epoch: 20 Global Step: 105530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:09,942-Speed 3412.92 samples/sec Loss 2.5171 LearningRate 0.0055 Epoch: 20 Global Step: 105540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:12,924-Speed 3435.01 samples/sec Loss 2.4040 LearningRate 0.0055 Epoch: 20 Global Step: 105550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:15,948-Speed 3386.59 samples/sec Loss 2.5470 LearningRate 0.0055 Epoch: 20 Global Step: 105560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:49:18,933-Speed 3431.72 samples/sec Loss 2.5258 LearningRate 0.0055 Epoch: 20 Global Step: 105570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:49:21,921-Speed 3429.16 samples/sec Loss 2.5029 LearningRate 0.0054 Epoch: 20 Global Step: 105580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:49:24,893-Speed 3446.49 samples/sec Loss 2.4875 LearningRate 0.0054 Epoch: 20 Global Step: 105590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:27,879-Speed 3430.67 samples/sec Loss 2.5675 LearningRate 0.0054 Epoch: 20 Global Step: 105600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:30,863-Speed 3431.75 samples/sec Loss 2.6227 LearningRate 0.0054 Epoch: 20 Global Step: 105610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:33,852-Speed 3427.84 samples/sec Loss 2.5748 LearningRate 0.0054 Epoch: 20 Global Step: 105620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:36,829-Speed 3439.93 samples/sec Loss 2.5195 LearningRate 0.0054 Epoch: 20 Global Step: 105630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:39,814-Speed 3431.93 samples/sec Loss 2.6430 LearningRate 0.0054 Epoch: 20 Global Step: 105640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:42,808-Speed 3420.13 samples/sec Loss 2.5599 LearningRate 0.0054 Epoch: 20 Global Step: 105650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:45,872-Speed 3343.45 samples/sec Loss 2.5427 LearningRate 0.0054 Epoch: 20 Global Step: 105660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:48,859-Speed 3429.91 samples/sec Loss 2.5263 LearningRate 0.0054 Epoch: 20 Global Step: 105670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:51,843-Speed 3432.22 samples/sec Loss 2.6240 LearningRate 0.0054 Epoch: 20 Global Step: 105680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:49:55,018-Speed 3226.29 samples/sec Loss 2.6646 LearningRate 0.0054 Epoch: 20 Global Step: 105690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:49:58,015-Speed 3417.36 samples/sec Loss 2.5155 LearningRate 0.0054 Epoch: 20 Global Step: 105700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:50:00,999-Speed 3431.68 samples/sec Loss 2.5164 LearningRate 0.0054 Epoch: 20 Global Step: 105710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:50:04,043-Speed 3365.19 samples/sec Loss 2.4958 LearningRate 0.0054 Epoch: 20 Global Step: 105720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:50:07,049-Speed 3408.12 samples/sec Loss 2.6280 LearningRate 0.0054 Epoch: 20 Global Step: 105730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:50:10,038-Speed 3426.23 samples/sec Loss 2.5853 LearningRate 0.0054 Epoch: 20 Global Step: 105740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:50:13,051-Speed 3399.78 samples/sec Loss 2.5667 LearningRate 0.0054 Epoch: 20 Global Step: 105750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:50:16,036-Speed 3431.25 samples/sec Loss 2.6946 LearningRate 0.0054 Epoch: 20 Global Step: 105760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:50:19,080-Speed 3365.66 samples/sec Loss 2.5237 LearningRate 0.0053 Epoch: 20 Global Step: 105770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:50:22,066-Speed 3430.30 samples/sec Loss 2.6327 LearningRate 0.0053 Epoch: 20 Global Step: 105780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:25,140-Speed 3331.75 samples/sec Loss 2.6086 LearningRate 0.0053 Epoch: 20 Global Step: 105790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:28,229-Speed 3315.97 samples/sec Loss 2.6098 LearningRate 0.0053 Epoch: 20 Global Step: 105800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:31,224-Speed 3419.62 samples/sec Loss 2.4970 LearningRate 0.0053 Epoch: 20 Global Step: 105810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:34,253-Speed 3382.40 samples/sec Loss 2.5190 LearningRate 0.0053 Epoch: 20 Global Step: 105820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:37,258-Speed 3407.79 samples/sec Loss 2.5189 LearningRate 0.0053 Epoch: 20 Global Step: 105830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:40,375-Speed 3286.19 samples/sec Loss 2.5180 LearningRate 0.0053 Epoch: 20 Global Step: 105840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:43,437-Speed 3345.51 samples/sec Loss 2.6045 LearningRate 0.0053 Epoch: 20 Global Step: 105850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:46,520-Speed 3323.18 samples/sec Loss 2.5389 LearningRate 0.0053 Epoch: 20 Global Step: 105860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:49,514-Speed 3421.78 samples/sec Loss 2.5315 LearningRate 0.0053 Epoch: 20 Global Step: 105870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:52,504-Speed 3425.25 samples/sec Loss 2.4936 LearningRate 0.0053 Epoch: 20 Global Step: 105880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:55,514-Speed 3403.02 samples/sec Loss 2.5526 LearningRate 0.0053 Epoch: 20 Global Step: 105890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:50:58,525-Speed 3400.95 samples/sec Loss 2.4845 LearningRate 0.0053 Epoch: 20 Global Step: 105900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:51:01,548-Speed 3388.72 samples/sec Loss 2.4353 LearningRate 0.0053 Epoch: 20 Global Step: 105910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:51:04,543-Speed 3419.44 samples/sec Loss 2.5191 LearningRate 0.0053 Epoch: 20 Global Step: 105920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:51:07,533-Speed 3425.19 samples/sec Loss 2.4953 LearningRate 0.0053 Epoch: 20 Global Step: 105930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:51:10,545-Speed 3402.27 samples/sec Loss 2.4135 LearningRate 0.0053 Epoch: 20 Global Step: 105940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:51:13,541-Speed 3418.93 samples/sec Loss 2.4275 LearningRate 0.0053 Epoch: 20 Global Step: 105950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:51:16,551-Speed 3403.70 samples/sec Loss 2.5941 LearningRate 0.0053 Epoch: 20 Global Step: 105960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:51:19,600-Speed 3359.55 samples/sec Loss 2.4646 LearningRate 0.0052 Epoch: 20 Global Step: 105970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:51:22,589-Speed 3426.61 samples/sec Loss 2.5409 LearningRate 0.0052 Epoch: 20 Global Step: 105980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:51:25,574-Speed 3431.22 samples/sec Loss 2.4476 LearningRate 0.0052 Epoch: 20 Global Step: 105990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:51:28,589-Speed 3396.79 samples/sec Loss 2.5222 LearningRate 0.0052 Epoch: 20 Global Step: 106000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:52:11,388-[lfw][106000]XNorm: 23.174001 Training: 2022-01-20 03:52:11,388-[lfw][106000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-20 03:52:11,389-[lfw][106000]Accuracy-Highest: 0.99833 Training: 2022-01-20 03:53:01,129-[cfp_fp][106000]XNorm: 22.188551 Training: 2022-01-20 03:53:01,130-[cfp_fp][106000]Accuracy-Flip: 0.98929+-0.00479 Training: 2022-01-20 03:53:01,130-[cfp_fp][106000]Accuracy-Highest: 0.98929 Training: 2022-01-20 03:53:43,852-[agedb_30][106000]XNorm: 23.240908 Training: 2022-01-20 03:53:43,853-[agedb_30][106000]Accuracy-Flip: 0.98367+-0.00581 Training: 2022-01-20 03:53:43,853-[agedb_30][106000]Accuracy-Highest: 0.98433 Training: 2022-01-20 03:53:46,869-Speed 74.05 samples/sec Loss 2.4782 LearningRate 0.0052 Epoch: 20 Global Step: 106010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:53:49,900-Speed 3378.59 samples/sec Loss 2.5831 LearningRate 0.0052 Epoch: 20 Global Step: 106020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:53:52,893-Speed 3421.92 samples/sec Loss 2.5375 LearningRate 0.0052 Epoch: 20 Global Step: 106030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:53:55,876-Speed 3434.10 samples/sec Loss 2.5147 LearningRate 0.0052 Epoch: 20 Global Step: 106040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:53:58,888-Speed 3400.15 samples/sec Loss 2.4995 LearningRate 0.0052 Epoch: 20 Global Step: 106050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:54:01,898-Speed 3403.11 samples/sec Loss 2.5013 LearningRate 0.0052 Epoch: 20 Global Step: 106060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:54:04,897-Speed 3415.79 samples/sec Loss 2.4922 LearningRate 0.0052 Epoch: 20 Global Step: 106070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:54:07,955-Speed 3348.97 samples/sec Loss 2.5225 LearningRate 0.0052 Epoch: 20 Global Step: 106080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:54:10,977-Speed 3391.03 samples/sec Loss 2.5509 LearningRate 0.0052 Epoch: 20 Global Step: 106090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:54:13,969-Speed 3423.04 samples/sec Loss 2.5072 LearningRate 0.0052 Epoch: 20 Global Step: 106100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:54:16,974-Speed 3408.09 samples/sec Loss 2.5462 LearningRate 0.0052 Epoch: 20 Global Step: 106110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:54:19,991-Speed 3395.36 samples/sec Loss 2.5809 LearningRate 0.0052 Epoch: 20 Global Step: 106120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:54:23,037-Speed 3362.03 samples/sec Loss 2.5175 LearningRate 0.0052 Epoch: 20 Global Step: 106130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:54:26,056-Speed 3393.72 samples/sec Loss 2.4905 LearningRate 0.0052 Epoch: 20 Global Step: 106140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:54:29,044-Speed 3427.08 samples/sec Loss 2.3848 LearningRate 0.0052 Epoch: 20 Global Step: 106150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:54:32,045-Speed 3414.21 samples/sec Loss 2.5556 LearningRate 0.0051 Epoch: 20 Global Step: 106160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:54:35,065-Speed 3391.87 samples/sec Loss 2.5303 LearningRate 0.0051 Epoch: 20 Global Step: 106170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:54:38,126-Speed 3346.05 samples/sec Loss 2.4566 LearningRate 0.0051 Epoch: 20 Global Step: 106180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:54:41,112-Speed 3430.34 samples/sec Loss 2.6666 LearningRate 0.0051 Epoch: 20 Global Step: 106190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:54:44,102-Speed 3425.72 samples/sec Loss 2.4204 LearningRate 0.0051 Epoch: 20 Global Step: 106200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:54:47,234-Speed 3270.31 samples/sec Loss 2.4847 LearningRate 0.0051 Epoch: 20 Global Step: 106210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:54:59,397-Speed 841.97 samples/sec Loss 2.3616 LearningRate 0.0051 Epoch: 21 Global Step: 106220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:55:02,470-Speed 3333.61 samples/sec Loss 1.7660 LearningRate 0.0051 Epoch: 21 Global Step: 106230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:55:05,477-Speed 3406.85 samples/sec Loss 1.8835 LearningRate 0.0051 Epoch: 21 Global Step: 106240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:55:08,494-Speed 3394.42 samples/sec Loss 1.7920 LearningRate 0.0051 Epoch: 21 Global Step: 106250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:55:11,544-Speed 3358.29 samples/sec Loss 1.9392 LearningRate 0.0051 Epoch: 21 Global Step: 106260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:55:14,540-Speed 3418.28 samples/sec Loss 1.8746 LearningRate 0.0051 Epoch: 21 Global Step: 106270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:55:17,504-Speed 3455.79 samples/sec Loss 1.8189 LearningRate 0.0051 Epoch: 21 Global Step: 106280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:55:20,490-Speed 3431.61 samples/sec Loss 1.7898 LearningRate 0.0051 Epoch: 21 Global Step: 106290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:55:23,511-Speed 3389.49 samples/sec Loss 1.8818 LearningRate 0.0051 Epoch: 21 Global Step: 106300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:55:26,500-Speed 3427.65 samples/sec Loss 1.7375 LearningRate 0.0051 Epoch: 21 Global Step: 106310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:55:29,571-Speed 3334.46 samples/sec Loss 1.7599 LearningRate 0.0051 Epoch: 21 Global Step: 106320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:55:32,558-Speed 3429.52 samples/sec Loss 1.9217 LearningRate 0.0051 Epoch: 21 Global Step: 106330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:55:35,668-Speed 3293.57 samples/sec Loss 1.8920 LearningRate 0.0051 Epoch: 21 Global Step: 106340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:55:38,779-Speed 3292.55 samples/sec Loss 1.8504 LearningRate 0.0051 Epoch: 21 Global Step: 106350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:55:41,816-Speed 3372.76 samples/sec Loss 1.9493 LearningRate 0.0050 Epoch: 21 Global Step: 106360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:55:44,904-Speed 3317.15 samples/sec Loss 1.8022 LearningRate 0.0050 Epoch: 21 Global Step: 106370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:55:47,890-Speed 3429.35 samples/sec Loss 1.9099 LearningRate 0.0050 Epoch: 21 Global Step: 106380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:55:50,881-Speed 3424.90 samples/sec Loss 1.8329 LearningRate 0.0050 Epoch: 21 Global Step: 106390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:55:53,991-Speed 3293.22 samples/sec Loss 1.8214 LearningRate 0.0050 Epoch: 21 Global Step: 106400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:55:57,137-Speed 3256.93 samples/sec Loss 1.8227 LearningRate 0.0050 Epoch: 21 Global Step: 106410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:56:00,193-Speed 3351.00 samples/sec Loss 1.8437 LearningRate 0.0050 Epoch: 21 Global Step: 106420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:56:03,185-Speed 3423.58 samples/sec Loss 1.8298 LearningRate 0.0050 Epoch: 21 Global Step: 106430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:56:06,160-Speed 3443.07 samples/sec Loss 1.8074 LearningRate 0.0050 Epoch: 21 Global Step: 106440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:56:09,144-Speed 3432.38 samples/sec Loss 1.8580 LearningRate 0.0050 Epoch: 21 Global Step: 106450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:56:12,208-Speed 3343.04 samples/sec Loss 1.9223 LearningRate 0.0050 Epoch: 21 Global Step: 106460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:56:15,193-Speed 3431.80 samples/sec Loss 1.8149 LearningRate 0.0050 Epoch: 21 Global Step: 106470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:56:18,190-Speed 3416.93 samples/sec Loss 1.8227 LearningRate 0.0050 Epoch: 21 Global Step: 106480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:56:21,219-Speed 3382.36 samples/sec Loss 1.8763 LearningRate 0.0050 Epoch: 21 Global Step: 106490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:56:24,213-Speed 3420.10 samples/sec Loss 1.8323 LearningRate 0.0050 Epoch: 21 Global Step: 106500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:56:27,198-Speed 3431.86 samples/sec Loss 1.8198 LearningRate 0.0050 Epoch: 21 Global Step: 106510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:56:30,186-Speed 3427.93 samples/sec Loss 1.8097 LearningRate 0.0050 Epoch: 21 Global Step: 106520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:56:33,164-Speed 3439.79 samples/sec Loss 1.8403 LearningRate 0.0050 Epoch: 21 Global Step: 106530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:56:36,239-Speed 3330.74 samples/sec Loss 1.8787 LearningRate 0.0050 Epoch: 21 Global Step: 106540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:56:39,225-Speed 3430.16 samples/sec Loss 1.7717 LearningRate 0.0050 Epoch: 21 Global Step: 106550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:56:42,266-Speed 3368.43 samples/sec Loss 1.9886 LearningRate 0.0049 Epoch: 21 Global Step: 106560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:56:45,372-Speed 3297.19 samples/sec Loss 1.8223 LearningRate 0.0049 Epoch: 21 Global Step: 106570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:56:48,383-Speed 3402.23 samples/sec Loss 1.8852 LearningRate 0.0049 Epoch: 21 Global Step: 106580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:56:51,446-Speed 3343.88 samples/sec Loss 1.9292 LearningRate 0.0049 Epoch: 21 Global Step: 106590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:56:54,443-Speed 3417.84 samples/sec Loss 1.9918 LearningRate 0.0049 Epoch: 21 Global Step: 106600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:56:57,422-Speed 3438.77 samples/sec Loss 1.9035 LearningRate 0.0049 Epoch: 21 Global Step: 106610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:57:00,415-Speed 3421.93 samples/sec Loss 1.8687 LearningRate 0.0049 Epoch: 21 Global Step: 106620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:57:03,404-Speed 3426.52 samples/sec Loss 1.9469 LearningRate 0.0049 Epoch: 21 Global Step: 106630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:57:06,391-Speed 3428.77 samples/sec Loss 1.9450 LearningRate 0.0049 Epoch: 21 Global Step: 106640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:57:09,388-Speed 3417.92 samples/sec Loss 1.9036 LearningRate 0.0049 Epoch: 21 Global Step: 106650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:57:12,411-Speed 3388.95 samples/sec Loss 1.9487 LearningRate 0.0049 Epoch: 21 Global Step: 106660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:15,393-Speed 3434.06 samples/sec Loss 1.8471 LearningRate 0.0049 Epoch: 21 Global Step: 106670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:18,371-Speed 3439.17 samples/sec Loss 1.8943 LearningRate 0.0049 Epoch: 21 Global Step: 106680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:21,357-Speed 3431.48 samples/sec Loss 1.8961 LearningRate 0.0049 Epoch: 21 Global Step: 106690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:24,344-Speed 3429.10 samples/sec Loss 1.8259 LearningRate 0.0049 Epoch: 21 Global Step: 106700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:27,338-Speed 3420.89 samples/sec Loss 2.0178 LearningRate 0.0049 Epoch: 21 Global Step: 106710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:30,335-Speed 3417.35 samples/sec Loss 1.9596 LearningRate 0.0049 Epoch: 21 Global Step: 106720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:33,367-Speed 3377.50 samples/sec Loss 1.9493 LearningRate 0.0049 Epoch: 21 Global Step: 106730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:36,406-Speed 3370.82 samples/sec Loss 1.9406 LearningRate 0.0049 Epoch: 21 Global Step: 106740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:39,393-Speed 3428.67 samples/sec Loss 1.8668 LearningRate 0.0049 Epoch: 21 Global Step: 106750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:42,378-Speed 3432.90 samples/sec Loss 1.9143 LearningRate 0.0048 Epoch: 21 Global Step: 106760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 03:57:45,355-Speed 3440.28 samples/sec Loss 1.8897 LearningRate 0.0048 Epoch: 21 Global Step: 106770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:48,400-Speed 3363.19 samples/sec Loss 1.8976 LearningRate 0.0048 Epoch: 21 Global Step: 106780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:51,432-Speed 3379.03 samples/sec Loss 1.9331 LearningRate 0.0048 Epoch: 21 Global Step: 106790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:54,411-Speed 3437.08 samples/sec Loss 1.9356 LearningRate 0.0048 Epoch: 21 Global Step: 106800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:57:57,394-Speed 3434.22 samples/sec Loss 1.9962 LearningRate 0.0048 Epoch: 21 Global Step: 106810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:58:00,387-Speed 3421.64 samples/sec Loss 1.8584 LearningRate 0.0048 Epoch: 21 Global Step: 106820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:58:03,364-Speed 3441.08 samples/sec Loss 1.9176 LearningRate 0.0048 Epoch: 21 Global Step: 106830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:58:06,344-Speed 3437.59 samples/sec Loss 1.8552 LearningRate 0.0048 Epoch: 21 Global Step: 106840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:58:09,326-Speed 3434.42 samples/sec Loss 1.8216 LearningRate 0.0048 Epoch: 21 Global Step: 106850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:58:12,309-Speed 3433.57 samples/sec Loss 1.9377 LearningRate 0.0048 Epoch: 21 Global Step: 106860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:58:15,255-Speed 3477.69 samples/sec Loss 1.9206 LearningRate 0.0048 Epoch: 21 Global Step: 106870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:58:18,252-Speed 3416.41 samples/sec Loss 1.9807 LearningRate 0.0048 Epoch: 21 Global Step: 106880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:58:21,311-Speed 3348.73 samples/sec Loss 1.9310 LearningRate 0.0048 Epoch: 21 Global Step: 106890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:58:24,319-Speed 3405.03 samples/sec Loss 1.8890 LearningRate 0.0048 Epoch: 21 Global Step: 106900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:58:27,324-Speed 3409.45 samples/sec Loss 1.9419 LearningRate 0.0048 Epoch: 21 Global Step: 106910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:58:30,369-Speed 3362.96 samples/sec Loss 1.9275 LearningRate 0.0048 Epoch: 21 Global Step: 106920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:58:33,387-Speed 3394.18 samples/sec Loss 1.9115 LearningRate 0.0048 Epoch: 21 Global Step: 106930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:58:36,369-Speed 3434.38 samples/sec Loss 1.8853 LearningRate 0.0048 Epoch: 21 Global Step: 106940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:58:39,384-Speed 3397.55 samples/sec Loss 1.9967 LearningRate 0.0048 Epoch: 21 Global Step: 106950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:58:42,452-Speed 3339.15 samples/sec Loss 2.0018 LearningRate 0.0048 Epoch: 21 Global Step: 106960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:58:45,544-Speed 3312.71 samples/sec Loss 1.8576 LearningRate 0.0047 Epoch: 21 Global Step: 106970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:58:48,519-Speed 3441.70 samples/sec Loss 2.0074 LearningRate 0.0047 Epoch: 21 Global Step: 106980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:58:51,502-Speed 3433.72 samples/sec Loss 1.9963 LearningRate 0.0047 Epoch: 21 Global Step: 106990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:58:54,480-Speed 3440.01 samples/sec Loss 1.9107 LearningRate 0.0047 Epoch: 21 Global Step: 107000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:58:57,475-Speed 3419.91 samples/sec Loss 1.9862 LearningRate 0.0047 Epoch: 21 Global Step: 107010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:59:00,454-Speed 3437.92 samples/sec Loss 1.9462 LearningRate 0.0047 Epoch: 21 Global Step: 107020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:59:03,450-Speed 3419.07 samples/sec Loss 1.8809 LearningRate 0.0047 Epoch: 21 Global Step: 107030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:59:06,419-Speed 3449.58 samples/sec Loss 1.9173 LearningRate 0.0047 Epoch: 21 Global Step: 107040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:59:09,407-Speed 3428.71 samples/sec Loss 1.9162 LearningRate 0.0047 Epoch: 21 Global Step: 107050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:59:12,508-Speed 3302.37 samples/sec Loss 2.0743 LearningRate 0.0047 Epoch: 21 Global Step: 107060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:59:15,541-Speed 3377.05 samples/sec Loss 1.9763 LearningRate 0.0047 Epoch: 21 Global Step: 107070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:59:18,724-Speed 3218.15 samples/sec Loss 1.9250 LearningRate 0.0047 Epoch: 21 Global Step: 107080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:59:21,747-Speed 3387.83 samples/sec Loss 2.0036 LearningRate 0.0047 Epoch: 21 Global Step: 107090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:59:24,734-Speed 3429.08 samples/sec Loss 1.9385 LearningRate 0.0047 Epoch: 21 Global Step: 107100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:59:27,723-Speed 3427.01 samples/sec Loss 2.0029 LearningRate 0.0047 Epoch: 21 Global Step: 107110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:59:30,724-Speed 3413.52 samples/sec Loss 1.8417 LearningRate 0.0047 Epoch: 21 Global Step: 107120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:59:33,706-Speed 3434.34 samples/sec Loss 1.9181 LearningRate 0.0047 Epoch: 21 Global Step: 107130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 03:59:36,689-Speed 3433.87 samples/sec Loss 1.9645 LearningRate 0.0047 Epoch: 21 Global Step: 107140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:59:39,709-Speed 3391.72 samples/sec Loss 1.8961 LearningRate 0.0047 Epoch: 21 Global Step: 107150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:59:42,746-Speed 3372.64 samples/sec Loss 1.9651 LearningRate 0.0047 Epoch: 21 Global Step: 107160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:59:45,733-Speed 3429.25 samples/sec Loss 2.0577 LearningRate 0.0046 Epoch: 21 Global Step: 107170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:59:48,721-Speed 3427.26 samples/sec Loss 1.8735 LearningRate 0.0046 Epoch: 21 Global Step: 107180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:59:51,712-Speed 3424.57 samples/sec Loss 1.9032 LearningRate 0.0046 Epoch: 21 Global Step: 107190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:59:54,700-Speed 3427.96 samples/sec Loss 1.8228 LearningRate 0.0046 Epoch: 21 Global Step: 107200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 03:59:57,747-Speed 3361.53 samples/sec Loss 2.0072 LearningRate 0.0046 Epoch: 21 Global Step: 107210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:00,778-Speed 3380.15 samples/sec Loss 1.9295 LearningRate 0.0046 Epoch: 21 Global Step: 107220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:03,780-Speed 3411.66 samples/sec Loss 2.0417 LearningRate 0.0046 Epoch: 21 Global Step: 107230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:06,784-Speed 3410.12 samples/sec Loss 2.0284 LearningRate 0.0046 Epoch: 21 Global Step: 107240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:00:09,806-Speed 3389.48 samples/sec Loss 1.9505 LearningRate 0.0046 Epoch: 21 Global Step: 107250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:00:12,788-Speed 3434.14 samples/sec Loss 1.9459 LearningRate 0.0046 Epoch: 21 Global Step: 107260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:00:15,780-Speed 3423.71 samples/sec Loss 1.8869 LearningRate 0.0046 Epoch: 21 Global Step: 107270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:00:18,766-Speed 3430.13 samples/sec Loss 1.9065 LearningRate 0.0046 Epoch: 21 Global Step: 107280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:00:21,752-Speed 3429.96 samples/sec Loss 2.0091 LearningRate 0.0046 Epoch: 21 Global Step: 107290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:00:24,764-Speed 3400.42 samples/sec Loss 1.9843 LearningRate 0.0046 Epoch: 21 Global Step: 107300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:27,748-Speed 3433.29 samples/sec Loss 2.0112 LearningRate 0.0046 Epoch: 21 Global Step: 107310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:30,730-Speed 3434.31 samples/sec Loss 1.9924 LearningRate 0.0046 Epoch: 21 Global Step: 107320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:33,712-Speed 3434.66 samples/sec Loss 1.8877 LearningRate 0.0046 Epoch: 21 Global Step: 107330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:36,697-Speed 3432.25 samples/sec Loss 1.9704 LearningRate 0.0046 Epoch: 21 Global Step: 107340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:39,681-Speed 3431.75 samples/sec Loss 2.0161 LearningRate 0.0046 Epoch: 21 Global Step: 107350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:42,661-Speed 3437.90 samples/sec Loss 1.9898 LearningRate 0.0046 Epoch: 21 Global Step: 107360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:45,644-Speed 3433.41 samples/sec Loss 2.0457 LearningRate 0.0046 Epoch: 21 Global Step: 107370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:48,634-Speed 3425.04 samples/sec Loss 1.8582 LearningRate 0.0045 Epoch: 21 Global Step: 107380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:51,620-Speed 3430.80 samples/sec Loss 2.0325 LearningRate 0.0045 Epoch: 21 Global Step: 107390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:00:54,603-Speed 3433.79 samples/sec Loss 1.8988 LearningRate 0.0045 Epoch: 21 Global Step: 107400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:00:57,587-Speed 3432.58 samples/sec Loss 1.9800 LearningRate 0.0045 Epoch: 21 Global Step: 107410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:00,600-Speed 3399.32 samples/sec Loss 1.9289 LearningRate 0.0045 Epoch: 21 Global Step: 107420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:03,580-Speed 3436.89 samples/sec Loss 1.8961 LearningRate 0.0045 Epoch: 21 Global Step: 107430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:06,567-Speed 3429.77 samples/sec Loss 1.9462 LearningRate 0.0045 Epoch: 21 Global Step: 107440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:09,547-Speed 3436.36 samples/sec Loss 1.8896 LearningRate 0.0045 Epoch: 21 Global Step: 107450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:12,598-Speed 3356.99 samples/sec Loss 1.9859 LearningRate 0.0045 Epoch: 21 Global Step: 107460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:15,600-Speed 3411.93 samples/sec Loss 1.9050 LearningRate 0.0045 Epoch: 21 Global Step: 107470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:18,595-Speed 3419.81 samples/sec Loss 1.9517 LearningRate 0.0045 Epoch: 21 Global Step: 107480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:21,585-Speed 3426.06 samples/sec Loss 1.9383 LearningRate 0.0045 Epoch: 21 Global Step: 107490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:24,570-Speed 3431.50 samples/sec Loss 1.9864 LearningRate 0.0045 Epoch: 21 Global Step: 107500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:27,557-Speed 3429.62 samples/sec Loss 1.9376 LearningRate 0.0045 Epoch: 21 Global Step: 107510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:01:30,546-Speed 3426.71 samples/sec Loss 1.9856 LearningRate 0.0045 Epoch: 21 Global Step: 107520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:01:33,528-Speed 3434.81 samples/sec Loss 1.9058 LearningRate 0.0045 Epoch: 21 Global Step: 107530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:01:36,516-Speed 3428.16 samples/sec Loss 1.9580 LearningRate 0.0045 Epoch: 21 Global Step: 107540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:01:39,503-Speed 3428.88 samples/sec Loss 1.9763 LearningRate 0.0045 Epoch: 21 Global Step: 107550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:01:42,545-Speed 3366.38 samples/sec Loss 1.9560 LearningRate 0.0045 Epoch: 21 Global Step: 107560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:01:45,528-Speed 3434.19 samples/sec Loss 1.9133 LearningRate 0.0045 Epoch: 21 Global Step: 107570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:01:48,509-Speed 3435.88 samples/sec Loss 2.0439 LearningRate 0.0045 Epoch: 21 Global Step: 107580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:51,493-Speed 3433.24 samples/sec Loss 1.9498 LearningRate 0.0044 Epoch: 21 Global Step: 107590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:54,481-Speed 3426.99 samples/sec Loss 1.9869 LearningRate 0.0044 Epoch: 21 Global Step: 107600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:01:57,536-Speed 3353.66 samples/sec Loss 2.0193 LearningRate 0.0044 Epoch: 21 Global Step: 107610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:00,543-Speed 3405.50 samples/sec Loss 1.9785 LearningRate 0.0044 Epoch: 21 Global Step: 107620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:03,540-Speed 3417.54 samples/sec Loss 2.0310 LearningRate 0.0044 Epoch: 21 Global Step: 107630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:06,547-Speed 3407.16 samples/sec Loss 1.9762 LearningRate 0.0044 Epoch: 21 Global Step: 107640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:09,591-Speed 3365.26 samples/sec Loss 1.8958 LearningRate 0.0044 Epoch: 21 Global Step: 107650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:12,583-Speed 3422.70 samples/sec Loss 2.0599 LearningRate 0.0044 Epoch: 21 Global Step: 107660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:15,573-Speed 3425.56 samples/sec Loss 1.9937 LearningRate 0.0044 Epoch: 21 Global Step: 107670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:18,611-Speed 3372.20 samples/sec Loss 2.0167 LearningRate 0.0044 Epoch: 21 Global Step: 107680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:02:21,628-Speed 3395.43 samples/sec Loss 1.9724 LearningRate 0.0044 Epoch: 21 Global Step: 107690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:02:24,607-Speed 3437.69 samples/sec Loss 1.8880 LearningRate 0.0044 Epoch: 21 Global Step: 107700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:27,610-Speed 3411.61 samples/sec Loss 1.9786 LearningRate 0.0044 Epoch: 21 Global Step: 107710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:30,638-Speed 3382.46 samples/sec Loss 1.9281 LearningRate 0.0044 Epoch: 21 Global Step: 107720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:33,656-Speed 3393.68 samples/sec Loss 1.9114 LearningRate 0.0044 Epoch: 21 Global Step: 107730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:36,654-Speed 3416.32 samples/sec Loss 1.9240 LearningRate 0.0044 Epoch: 21 Global Step: 107740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:39,652-Speed 3416.79 samples/sec Loss 2.0221 LearningRate 0.0044 Epoch: 21 Global Step: 107750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:42,638-Speed 3429.94 samples/sec Loss 1.9326 LearningRate 0.0044 Epoch: 21 Global Step: 107760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:45,629-Speed 3425.20 samples/sec Loss 2.0257 LearningRate 0.0044 Epoch: 21 Global Step: 107770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:48,626-Speed 3417.51 samples/sec Loss 1.9910 LearningRate 0.0044 Epoch: 21 Global Step: 107780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:51,636-Speed 3431.97 samples/sec Loss 1.9669 LearningRate 0.0044 Epoch: 21 Global Step: 107790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:54,607-Speed 3447.30 samples/sec Loss 1.9612 LearningRate 0.0044 Epoch: 21 Global Step: 107800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:02:57,647-Speed 3432.14 samples/sec Loss 2.0007 LearningRate 0.0043 Epoch: 21 Global Step: 107810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:00,634-Speed 3429.94 samples/sec Loss 1.9977 LearningRate 0.0043 Epoch: 21 Global Step: 107820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:03,619-Speed 3430.32 samples/sec Loss 1.9874 LearningRate 0.0043 Epoch: 21 Global Step: 107830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:06,667-Speed 3435.73 samples/sec Loss 1.9731 LearningRate 0.0043 Epoch: 21 Global Step: 107840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:09,648-Speed 3436.26 samples/sec Loss 1.9467 LearningRate 0.0043 Epoch: 21 Global Step: 107850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:12,720-Speed 3404.51 samples/sec Loss 2.0136 LearningRate 0.0043 Epoch: 21 Global Step: 107860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:15,794-Speed 3331.72 samples/sec Loss 1.9711 LearningRate 0.0043 Epoch: 21 Global Step: 107870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:18,823-Speed 3381.45 samples/sec Loss 2.0102 LearningRate 0.0043 Epoch: 21 Global Step: 107880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:21,869-Speed 3363.01 samples/sec Loss 2.0043 LearningRate 0.0043 Epoch: 21 Global Step: 107890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:24,919-Speed 3357.83 samples/sec Loss 2.0031 LearningRate 0.0043 Epoch: 21 Global Step: 107900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:03:27,915-Speed 3418.91 samples/sec Loss 1.9628 LearningRate 0.0043 Epoch: 21 Global Step: 107910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:03:30,906-Speed 3423.83 samples/sec Loss 2.0174 LearningRate 0.0043 Epoch: 21 Global Step: 107920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:33,959-Speed 3355.44 samples/sec Loss 2.0743 LearningRate 0.0043 Epoch: 21 Global Step: 107930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:36,996-Speed 3372.88 samples/sec Loss 2.0014 LearningRate 0.0043 Epoch: 21 Global Step: 107940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:39,989-Speed 3421.91 samples/sec Loss 1.9237 LearningRate 0.0043 Epoch: 21 Global Step: 107950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:43,001-Speed 3401.21 samples/sec Loss 1.9220 LearningRate 0.0043 Epoch: 21 Global Step: 107960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:46,004-Speed 3410.58 samples/sec Loss 1.9685 LearningRate 0.0043 Epoch: 21 Global Step: 107970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:48,990-Speed 3429.67 samples/sec Loss 1.9289 LearningRate 0.0043 Epoch: 21 Global Step: 107980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:51,972-Speed 3435.54 samples/sec Loss 1.9628 LearningRate 0.0043 Epoch: 21 Global Step: 107990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:03:54,936-Speed 3455.23 samples/sec Loss 1.9095 LearningRate 0.0043 Epoch: 21 Global Step: 108000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:04:38,007-[lfw][108000]XNorm: 22.740100 Training: 2022-01-20 04:04:38,008-[lfw][108000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-01-20 04:04:38,008-[lfw][108000]Accuracy-Highest: 0.99833 Training: 2022-01-20 04:05:27,936-[cfp_fp][108000]XNorm: 21.819698 Training: 2022-01-20 04:05:27,937-[cfp_fp][108000]Accuracy-Flip: 0.98871+-0.00505 Training: 2022-01-20 04:05:27,938-[cfp_fp][108000]Accuracy-Highest: 0.98929 Training: 2022-01-20 04:06:10,759-[agedb_30][108000]XNorm: 22.808241 Training: 2022-01-20 04:06:10,760-[agedb_30][108000]Accuracy-Flip: 0.98433+-0.00651 Training: 2022-01-20 04:06:10,761-[agedb_30][108000]Accuracy-Highest: 0.98433 Training: 2022-01-20 04:06:13,761-Speed 73.76 samples/sec Loss 1.9772 LearningRate 0.0043 Epoch: 21 Global Step: 108010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:06:16,737-Speed 3442.29 samples/sec Loss 2.0758 LearningRate 0.0042 Epoch: 21 Global Step: 108020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:06:19,708-Speed 3447.10 samples/sec Loss 1.9333 LearningRate 0.0042 Epoch: 21 Global Step: 108030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:06:22,798-Speed 3314.94 samples/sec Loss 2.0983 LearningRate 0.0042 Epoch: 21 Global Step: 108040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:06:25,853-Speed 3353.03 samples/sec Loss 2.0663 LearningRate 0.0042 Epoch: 21 Global Step: 108050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:06:28,883-Speed 3379.96 samples/sec Loss 2.0056 LearningRate 0.0042 Epoch: 21 Global Step: 108060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:06:31,861-Speed 3439.82 samples/sec Loss 1.9369 LearningRate 0.0042 Epoch: 21 Global Step: 108070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:06:34,866-Speed 3408.33 samples/sec Loss 2.0423 LearningRate 0.0042 Epoch: 21 Global Step: 108080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:06:38,011-Speed 3256.32 samples/sec Loss 2.0637 LearningRate 0.0042 Epoch: 21 Global Step: 108090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:06:41,267-Speed 3146.18 samples/sec Loss 1.8259 LearningRate 0.0042 Epoch: 21 Global Step: 108100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:06:44,276-Speed 3403.08 samples/sec Loss 2.0295 LearningRate 0.0042 Epoch: 21 Global Step: 108110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:06:47,279-Speed 3411.34 samples/sec Loss 1.9633 LearningRate 0.0042 Epoch: 21 Global Step: 108120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:06:50,287-Speed 3404.90 samples/sec Loss 2.0499 LearningRate 0.0042 Epoch: 21 Global Step: 108130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:06:53,400-Speed 3290.60 samples/sec Loss 2.0521 LearningRate 0.0042 Epoch: 21 Global Step: 108140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:06:56,391-Speed 3424.90 samples/sec Loss 1.9828 LearningRate 0.0042 Epoch: 21 Global Step: 108150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:06:59,389-Speed 3415.48 samples/sec Loss 1.9531 LearningRate 0.0042 Epoch: 21 Global Step: 108160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:02,446-Speed 3351.63 samples/sec Loss 2.0170 LearningRate 0.0042 Epoch: 21 Global Step: 108170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:05,447-Speed 3412.38 samples/sec Loss 1.9594 LearningRate 0.0042 Epoch: 21 Global Step: 108180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:08,477-Speed 3381.04 samples/sec Loss 1.9325 LearningRate 0.0042 Epoch: 21 Global Step: 108190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:11,461-Speed 3433.32 samples/sec Loss 1.9088 LearningRate 0.0042 Epoch: 21 Global Step: 108200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:07:14,509-Speed 3360.33 samples/sec Loss 1.9903 LearningRate 0.0042 Epoch: 21 Global Step: 108210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:17,707-Speed 3202.03 samples/sec Loss 2.0404 LearningRate 0.0042 Epoch: 21 Global Step: 108220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:20,687-Speed 3437.50 samples/sec Loss 2.0987 LearningRate 0.0042 Epoch: 21 Global Step: 108230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:23,681-Speed 3421.65 samples/sec Loss 1.9756 LearningRate 0.0041 Epoch: 21 Global Step: 108240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:26,659-Speed 3438.42 samples/sec Loss 2.0280 LearningRate 0.0041 Epoch: 21 Global Step: 108250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:29,638-Speed 3438.55 samples/sec Loss 2.0536 LearningRate 0.0041 Epoch: 21 Global Step: 108260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:32,616-Speed 3439.43 samples/sec Loss 1.9721 LearningRate 0.0041 Epoch: 21 Global Step: 108270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:35,596-Speed 3437.57 samples/sec Loss 1.9854 LearningRate 0.0041 Epoch: 21 Global Step: 108280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:38,594-Speed 3416.87 samples/sec Loss 2.0397 LearningRate 0.0041 Epoch: 21 Global Step: 108290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:41,569-Speed 3442.46 samples/sec Loss 1.9889 LearningRate 0.0041 Epoch: 21 Global Step: 108300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:44,529-Speed 3459.78 samples/sec Loss 2.0356 LearningRate 0.0041 Epoch: 21 Global Step: 108310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:47,512-Speed 3434.97 samples/sec Loss 1.9810 LearningRate 0.0041 Epoch: 21 Global Step: 108320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:50,491-Speed 3437.82 samples/sec Loss 1.9426 LearningRate 0.0041 Epoch: 21 Global Step: 108330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:53,468-Speed 3440.17 samples/sec Loss 1.9934 LearningRate 0.0041 Epoch: 21 Global Step: 108340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:56,447-Speed 3438.70 samples/sec Loss 2.0102 LearningRate 0.0041 Epoch: 21 Global Step: 108350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:07:59,430-Speed 3433.51 samples/sec Loss 2.0164 LearningRate 0.0041 Epoch: 21 Global Step: 108360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:02,413-Speed 3433.35 samples/sec Loss 1.9938 LearningRate 0.0041 Epoch: 21 Global Step: 108370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:05,389-Speed 3442.55 samples/sec Loss 2.0611 LearningRate 0.0041 Epoch: 21 Global Step: 108380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:08,360-Speed 3447.04 samples/sec Loss 1.9985 LearningRate 0.0041 Epoch: 21 Global Step: 108390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:11,347-Speed 3428.95 samples/sec Loss 2.0502 LearningRate 0.0041 Epoch: 21 Global Step: 108400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:14,330-Speed 3433.98 samples/sec Loss 1.9344 LearningRate 0.0041 Epoch: 21 Global Step: 108410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:08:17,290-Speed 3460.40 samples/sec Loss 2.0483 LearningRate 0.0041 Epoch: 21 Global Step: 108420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:20,357-Speed 3339.77 samples/sec Loss 2.0628 LearningRate 0.0041 Epoch: 21 Global Step: 108430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:23,388-Speed 3378.75 samples/sec Loss 1.9968 LearningRate 0.0041 Epoch: 21 Global Step: 108440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:26,441-Speed 3354.83 samples/sec Loss 2.0249 LearningRate 0.0041 Epoch: 21 Global Step: 108450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:29,420-Speed 3437.98 samples/sec Loss 2.0664 LearningRate 0.0040 Epoch: 21 Global Step: 108460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:32,407-Speed 3429.70 samples/sec Loss 1.9573 LearningRate 0.0040 Epoch: 21 Global Step: 108470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:35,381-Speed 3444.82 samples/sec Loss 1.9592 LearningRate 0.0040 Epoch: 21 Global Step: 108480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:38,359-Speed 3438.78 samples/sec Loss 1.9649 LearningRate 0.0040 Epoch: 21 Global Step: 108490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:41,334-Speed 3442.76 samples/sec Loss 1.9565 LearningRate 0.0040 Epoch: 21 Global Step: 108500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:44,310-Speed 3441.99 samples/sec Loss 1.9433 LearningRate 0.0040 Epoch: 21 Global Step: 108510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:47,294-Speed 3432.50 samples/sec Loss 2.0323 LearningRate 0.0040 Epoch: 21 Global Step: 108520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:08:50,273-Speed 3439.00 samples/sec Loss 1.9876 LearningRate 0.0040 Epoch: 21 Global Step: 108530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:08:53,253-Speed 3438.03 samples/sec Loss 1.9390 LearningRate 0.0040 Epoch: 21 Global Step: 108540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:08:56,220-Speed 3451.52 samples/sec Loss 2.0148 LearningRate 0.0040 Epoch: 21 Global Step: 108550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:08:59,215-Speed 3419.53 samples/sec Loss 2.0851 LearningRate 0.0040 Epoch: 21 Global Step: 108560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:02,201-Speed 3431.23 samples/sec Loss 2.0926 LearningRate 0.0040 Epoch: 21 Global Step: 108570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:05,186-Speed 3431.29 samples/sec Loss 1.9933 LearningRate 0.0040 Epoch: 21 Global Step: 108580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:08,163-Speed 3441.25 samples/sec Loss 2.0561 LearningRate 0.0040 Epoch: 21 Global Step: 108590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:11,186-Speed 3387.77 samples/sec Loss 1.9523 LearningRate 0.0040 Epoch: 21 Global Step: 108600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:14,204-Speed 3393.65 samples/sec Loss 1.9993 LearningRate 0.0040 Epoch: 21 Global Step: 108610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:17,184-Speed 3437.38 samples/sec Loss 2.0635 LearningRate 0.0040 Epoch: 21 Global Step: 108620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:20,168-Speed 3431.93 samples/sec Loss 2.0799 LearningRate 0.0040 Epoch: 21 Global Step: 108630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:23,176-Speed 3405.87 samples/sec Loss 2.1047 LearningRate 0.0040 Epoch: 21 Global Step: 108640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:26,171-Speed 3419.42 samples/sec Loss 2.0226 LearningRate 0.0040 Epoch: 21 Global Step: 108650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:09:29,139-Speed 3450.83 samples/sec Loss 1.9259 LearningRate 0.0040 Epoch: 21 Global Step: 108660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:32,164-Speed 3387.19 samples/sec Loss 2.0381 LearningRate 0.0040 Epoch: 21 Global Step: 108670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:35,171-Speed 3405.27 samples/sec Loss 1.9935 LearningRate 0.0039 Epoch: 21 Global Step: 108680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:38,166-Speed 3420.02 samples/sec Loss 2.0491 LearningRate 0.0039 Epoch: 21 Global Step: 108690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:41,169-Speed 3411.03 samples/sec Loss 1.9698 LearningRate 0.0039 Epoch: 21 Global Step: 108700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:09:44,132-Speed 3456.95 samples/sec Loss 1.9393 LearningRate 0.0039 Epoch: 21 Global Step: 108710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:09:47,110-Speed 3438.93 samples/sec Loss 2.0548 LearningRate 0.0039 Epoch: 21 Global Step: 108720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:09:50,087-Speed 3441.01 samples/sec Loss 1.9356 LearningRate 0.0039 Epoch: 21 Global Step: 108730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:09:53,092-Speed 3408.22 samples/sec Loss 1.9747 LearningRate 0.0039 Epoch: 21 Global Step: 108740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:09:56,084-Speed 3423.59 samples/sec Loss 1.9920 LearningRate 0.0039 Epoch: 21 Global Step: 108750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:09:59,063-Speed 3438.73 samples/sec Loss 2.0115 LearningRate 0.0039 Epoch: 21 Global Step: 108760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:02,053-Speed 3425.43 samples/sec Loss 1.8542 LearningRate 0.0039 Epoch: 21 Global Step: 108770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:05,108-Speed 3352.91 samples/sec Loss 2.0224 LearningRate 0.0039 Epoch: 21 Global Step: 108780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:08,100-Speed 3422.81 samples/sec Loss 2.0009 LearningRate 0.0039 Epoch: 21 Global Step: 108790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:11,079-Speed 3437.97 samples/sec Loss 1.9751 LearningRate 0.0039 Epoch: 21 Global Step: 108800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:14,059-Speed 3437.95 samples/sec Loss 1.9638 LearningRate 0.0039 Epoch: 21 Global Step: 108810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:10:17,076-Speed 3394.96 samples/sec Loss 1.9497 LearningRate 0.0039 Epoch: 21 Global Step: 108820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:10:20,059-Speed 3432.78 samples/sec Loss 2.0493 LearningRate 0.0039 Epoch: 21 Global Step: 108830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:10:23,048-Speed 3427.41 samples/sec Loss 2.0839 LearningRate 0.0039 Epoch: 21 Global Step: 108840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:10:26,095-Speed 3362.26 samples/sec Loss 2.0267 LearningRate 0.0039 Epoch: 21 Global Step: 108850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:10:29,080-Speed 3430.82 samples/sec Loss 2.0491 LearningRate 0.0039 Epoch: 21 Global Step: 108860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:10:32,046-Speed 3453.67 samples/sec Loss 2.0267 LearningRate 0.0039 Epoch: 21 Global Step: 108870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:35,063-Speed 3394.92 samples/sec Loss 1.9826 LearningRate 0.0039 Epoch: 21 Global Step: 108880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:38,041-Speed 3439.40 samples/sec Loss 2.0226 LearningRate 0.0039 Epoch: 21 Global Step: 108890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:41,027-Speed 3429.93 samples/sec Loss 1.9724 LearningRate 0.0039 Epoch: 21 Global Step: 108900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:44,006-Speed 3438.41 samples/sec Loss 1.8891 LearningRate 0.0038 Epoch: 21 Global Step: 108910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:46,991-Speed 3431.31 samples/sec Loss 2.0377 LearningRate 0.0038 Epoch: 21 Global Step: 108920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:49,979-Speed 3428.09 samples/sec Loss 1.9971 LearningRate 0.0038 Epoch: 21 Global Step: 108930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:52,959-Speed 3437.39 samples/sec Loss 2.0116 LearningRate 0.0038 Epoch: 21 Global Step: 108940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:55,938-Speed 3438.59 samples/sec Loss 2.0185 LearningRate 0.0038 Epoch: 21 Global Step: 108950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:10:58,920-Speed 3434.53 samples/sec Loss 2.0523 LearningRate 0.0038 Epoch: 21 Global Step: 108960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:11:01,906-Speed 3430.24 samples/sec Loss 2.0005 LearningRate 0.0038 Epoch: 21 Global Step: 108970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:04,891-Speed 3431.79 samples/sec Loss 2.0352 LearningRate 0.0038 Epoch: 21 Global Step: 108980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:07,891-Speed 3413.83 samples/sec Loss 1.9883 LearningRate 0.0038 Epoch: 21 Global Step: 108990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:10,876-Speed 3431.85 samples/sec Loss 1.9572 LearningRate 0.0038 Epoch: 21 Global Step: 109000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:13,858-Speed 3434.69 samples/sec Loss 2.0428 LearningRate 0.0038 Epoch: 21 Global Step: 109010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:16,845-Speed 3428.31 samples/sec Loss 2.1272 LearningRate 0.0038 Epoch: 21 Global Step: 109020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:19,847-Speed 3412.47 samples/sec Loss 1.9647 LearningRate 0.0038 Epoch: 21 Global Step: 109030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:22,872-Speed 3385.75 samples/sec Loss 2.0254 LearningRate 0.0038 Epoch: 21 Global Step: 109040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:25,884-Speed 3400.83 samples/sec Loss 2.0281 LearningRate 0.0038 Epoch: 21 Global Step: 109050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:28,861-Speed 3440.44 samples/sec Loss 1.9924 LearningRate 0.0038 Epoch: 21 Global Step: 109060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:31,865-Speed 3409.79 samples/sec Loss 2.0745 LearningRate 0.0038 Epoch: 21 Global Step: 109070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:11:34,846-Speed 3435.49 samples/sec Loss 2.0240 LearningRate 0.0038 Epoch: 21 Global Step: 109080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:11:37,814-Speed 3451.19 samples/sec Loss 2.0571 LearningRate 0.0038 Epoch: 21 Global Step: 109090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:40,806-Speed 3423.87 samples/sec Loss 2.0390 LearningRate 0.0038 Epoch: 21 Global Step: 109100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:11:43,786-Speed 3436.88 samples/sec Loss 2.0109 LearningRate 0.0038 Epoch: 21 Global Step: 109110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:11:46,796-Speed 3403.14 samples/sec Loss 2.0050 LearningRate 0.0038 Epoch: 21 Global Step: 109120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:11:49,903-Speed 3297.74 samples/sec Loss 2.0306 LearningRate 0.0038 Epoch: 21 Global Step: 109130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:11:52,989-Speed 3319.01 samples/sec Loss 1.9744 LearningRate 0.0037 Epoch: 21 Global Step: 109140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:11:56,129-Speed 3261.91 samples/sec Loss 1.9502 LearningRate 0.0037 Epoch: 21 Global Step: 109150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:11:59,193-Speed 3343.19 samples/sec Loss 2.0507 LearningRate 0.0037 Epoch: 21 Global Step: 109160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:12:02,192-Speed 3415.06 samples/sec Loss 2.0209 LearningRate 0.0037 Epoch: 21 Global Step: 109170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:12:05,191-Speed 3415.15 samples/sec Loss 2.0107 LearningRate 0.0037 Epoch: 21 Global Step: 109180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:12:08,171-Speed 3437.07 samples/sec Loss 1.9051 LearningRate 0.0037 Epoch: 21 Global Step: 109190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:12:11,154-Speed 3433.92 samples/sec Loss 2.0769 LearningRate 0.0037 Epoch: 21 Global Step: 109200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:12:14,144-Speed 3426.71 samples/sec Loss 2.0359 LearningRate 0.0037 Epoch: 21 Global Step: 109210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:17,130-Speed 3430.53 samples/sec Loss 1.9440 LearningRate 0.0037 Epoch: 21 Global Step: 109220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:20,115-Speed 3431.15 samples/sec Loss 1.9039 LearningRate 0.0037 Epoch: 21 Global Step: 109230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:23,104-Speed 3425.99 samples/sec Loss 2.0387 LearningRate 0.0037 Epoch: 21 Global Step: 109240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:26,088-Speed 3432.29 samples/sec Loss 2.0977 LearningRate 0.0037 Epoch: 21 Global Step: 109250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:29,090-Speed 3413.18 samples/sec Loss 1.9288 LearningRate 0.0037 Epoch: 21 Global Step: 109260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:32,070-Speed 3436.22 samples/sec Loss 1.9977 LearningRate 0.0037 Epoch: 21 Global Step: 109270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:35,066-Speed 3418.87 samples/sec Loss 2.0297 LearningRate 0.0037 Epoch: 21 Global Step: 109280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:38,051-Speed 3431.00 samples/sec Loss 2.1559 LearningRate 0.0037 Epoch: 21 Global Step: 109290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:41,037-Speed 3430.93 samples/sec Loss 2.0952 LearningRate 0.0037 Epoch: 21 Global Step: 109300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:44,022-Speed 3432.07 samples/sec Loss 1.9437 LearningRate 0.0037 Epoch: 21 Global Step: 109310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:12:46,989-Speed 3451.46 samples/sec Loss 1.9543 LearningRate 0.0037 Epoch: 21 Global Step: 109320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:49,974-Speed 3431.46 samples/sec Loss 1.9886 LearningRate 0.0037 Epoch: 21 Global Step: 109330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:12:53,061-Speed 3318.34 samples/sec Loss 1.9337 LearningRate 0.0037 Epoch: 21 Global Step: 109340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:12:56,042-Speed 3434.89 samples/sec Loss 2.0158 LearningRate 0.0037 Epoch: 21 Global Step: 109350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:12:59,056-Speed 3398.58 samples/sec Loss 2.0811 LearningRate 0.0037 Epoch: 21 Global Step: 109360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:13:02,038-Speed 3435.45 samples/sec Loss 2.0327 LearningRate 0.0036 Epoch: 21 Global Step: 109370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:13:05,070-Speed 3377.93 samples/sec Loss 1.8734 LearningRate 0.0036 Epoch: 21 Global Step: 109380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:13:08,051-Speed 3436.61 samples/sec Loss 2.1136 LearningRate 0.0036 Epoch: 21 Global Step: 109390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:13:11,034-Speed 3433.54 samples/sec Loss 1.9708 LearningRate 0.0036 Epoch: 21 Global Step: 109400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:13:14,024-Speed 3426.01 samples/sec Loss 2.0101 LearningRate 0.0036 Epoch: 21 Global Step: 109410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:13:17,004-Speed 3437.16 samples/sec Loss 2.0039 LearningRate 0.0036 Epoch: 21 Global Step: 109420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:13:19,986-Speed 3435.19 samples/sec Loss 1.9971 LearningRate 0.0036 Epoch: 21 Global Step: 109430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:13:22,976-Speed 3425.91 samples/sec Loss 1.9956 LearningRate 0.0036 Epoch: 21 Global Step: 109440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:25,976-Speed 3414.12 samples/sec Loss 2.0096 LearningRate 0.0036 Epoch: 21 Global Step: 109450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:28,970-Speed 3420.67 samples/sec Loss 1.9977 LearningRate 0.0036 Epoch: 21 Global Step: 109460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:31,960-Speed 3426.37 samples/sec Loss 2.0209 LearningRate 0.0036 Epoch: 21 Global Step: 109470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:34,956-Speed 3418.82 samples/sec Loss 1.9782 LearningRate 0.0036 Epoch: 21 Global Step: 109480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:37,954-Speed 3417.04 samples/sec Loss 1.9778 LearningRate 0.0036 Epoch: 21 Global Step: 109490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:41,014-Speed 3346.20 samples/sec Loss 1.9802 LearningRate 0.0036 Epoch: 21 Global Step: 109500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:43,996-Speed 3435.89 samples/sec Loss 1.9899 LearningRate 0.0036 Epoch: 21 Global Step: 109510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:47,104-Speed 3295.00 samples/sec Loss 1.9655 LearningRate 0.0036 Epoch: 21 Global Step: 109520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:50,099-Speed 3420.23 samples/sec Loss 2.0052 LearningRate 0.0036 Epoch: 21 Global Step: 109530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:53,098-Speed 3415.69 samples/sec Loss 2.0452 LearningRate 0.0036 Epoch: 21 Global Step: 109540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:56,085-Speed 3428.82 samples/sec Loss 2.1126 LearningRate 0.0036 Epoch: 21 Global Step: 109550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:13:59,148-Speed 3343.69 samples/sec Loss 2.0155 LearningRate 0.0036 Epoch: 21 Global Step: 109560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:02,213-Speed 3342.20 samples/sec Loss 1.9625 LearningRate 0.0036 Epoch: 21 Global Step: 109570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:05,198-Speed 3430.73 samples/sec Loss 2.0727 LearningRate 0.0036 Epoch: 21 Global Step: 109580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:08,178-Speed 3437.38 samples/sec Loss 2.0492 LearningRate 0.0036 Epoch: 21 Global Step: 109590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:11,217-Speed 3370.52 samples/sec Loss 2.0143 LearningRate 0.0036 Epoch: 21 Global Step: 109600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:14,215-Speed 3416.47 samples/sec Loss 2.1086 LearningRate 0.0035 Epoch: 21 Global Step: 109610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:17,222-Speed 3406.28 samples/sec Loss 2.0554 LearningRate 0.0035 Epoch: 21 Global Step: 109620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:20,207-Speed 3430.69 samples/sec Loss 2.0436 LearningRate 0.0035 Epoch: 21 Global Step: 109630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:23,222-Speed 3397.83 samples/sec Loss 2.0166 LearningRate 0.0035 Epoch: 21 Global Step: 109640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:14:26,209-Speed 3428.57 samples/sec Loss 2.0621 LearningRate 0.0035 Epoch: 21 Global Step: 109650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:14:29,175-Speed 3454.32 samples/sec Loss 2.0621 LearningRate 0.0035 Epoch: 21 Global Step: 109660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:32,179-Speed 3409.81 samples/sec Loss 1.8852 LearningRate 0.0035 Epoch: 21 Global Step: 109670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:35,200-Speed 3389.90 samples/sec Loss 2.0311 LearningRate 0.0035 Epoch: 21 Global Step: 109680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:38,189-Speed 3427.55 samples/sec Loss 2.0163 LearningRate 0.0035 Epoch: 21 Global Step: 109690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:41,176-Speed 3428.61 samples/sec Loss 2.1047 LearningRate 0.0035 Epoch: 21 Global Step: 109700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:44,161-Speed 3431.37 samples/sec Loss 1.9745 LearningRate 0.0035 Epoch: 21 Global Step: 109710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:47,167-Speed 3407.50 samples/sec Loss 1.9942 LearningRate 0.0035 Epoch: 21 Global Step: 109720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:50,190-Speed 3387.88 samples/sec Loss 1.9948 LearningRate 0.0035 Epoch: 21 Global Step: 109730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:53,253-Speed 3344.57 samples/sec Loss 1.9277 LearningRate 0.0035 Epoch: 21 Global Step: 109740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:56,269-Speed 3395.53 samples/sec Loss 2.0205 LearningRate 0.0035 Epoch: 21 Global Step: 109750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:14:59,252-Speed 3433.96 samples/sec Loss 1.9859 LearningRate 0.0035 Epoch: 21 Global Step: 109760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:15:02,237-Speed 3431.09 samples/sec Loss 2.0345 LearningRate 0.0035 Epoch: 21 Global Step: 109770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:15:05,226-Speed 3427.30 samples/sec Loss 1.9812 LearningRate 0.0035 Epoch: 21 Global Step: 109780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:08,211-Speed 3431.57 samples/sec Loss 2.0456 LearningRate 0.0035 Epoch: 21 Global Step: 109790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:11,202-Speed 3424.24 samples/sec Loss 2.0339 LearningRate 0.0035 Epoch: 21 Global Step: 109800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:14,187-Speed 3431.94 samples/sec Loss 2.0237 LearningRate 0.0035 Epoch: 21 Global Step: 109810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:17,169-Speed 3434.26 samples/sec Loss 2.0682 LearningRate 0.0035 Epoch: 21 Global Step: 109820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:20,169-Speed 3413.79 samples/sec Loss 1.9441 LearningRate 0.0035 Epoch: 21 Global Step: 109830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:23,202-Speed 3377.15 samples/sec Loss 2.0430 LearningRate 0.0035 Epoch: 21 Global Step: 109840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:26,200-Speed 3417.49 samples/sec Loss 1.9733 LearningRate 0.0034 Epoch: 21 Global Step: 109850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:29,186-Speed 3429.54 samples/sec Loss 2.0382 LearningRate 0.0034 Epoch: 21 Global Step: 109860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:32,168-Speed 3435.45 samples/sec Loss 2.0421 LearningRate 0.0034 Epoch: 21 Global Step: 109870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:35,152-Speed 3431.56 samples/sec Loss 1.9727 LearningRate 0.0034 Epoch: 21 Global Step: 109880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:15:38,135-Speed 3433.90 samples/sec Loss 2.0482 LearningRate 0.0034 Epoch: 21 Global Step: 109890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:15:41,152-Speed 3395.77 samples/sec Loss 2.0159 LearningRate 0.0034 Epoch: 21 Global Step: 109900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:15:44,183-Speed 3379.13 samples/sec Loss 2.0610 LearningRate 0.0034 Epoch: 21 Global Step: 109910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:15:47,192-Speed 3403.72 samples/sec Loss 2.0074 LearningRate 0.0034 Epoch: 21 Global Step: 109920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:50,182-Speed 3424.67 samples/sec Loss 1.9667 LearningRate 0.0034 Epoch: 21 Global Step: 109930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:53,167-Speed 3432.04 samples/sec Loss 2.0149 LearningRate 0.0034 Epoch: 21 Global Step: 109940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:56,159-Speed 3423.57 samples/sec Loss 1.9705 LearningRate 0.0034 Epoch: 21 Global Step: 109950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:15:59,147-Speed 3428.56 samples/sec Loss 1.9349 LearningRate 0.0034 Epoch: 21 Global Step: 109960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:16:02,225-Speed 3327.44 samples/sec Loss 2.0520 LearningRate 0.0034 Epoch: 21 Global Step: 109970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:16:05,211-Speed 3429.81 samples/sec Loss 2.0188 LearningRate 0.0034 Epoch: 21 Global Step: 109980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:16:08,196-Speed 3436.05 samples/sec Loss 2.0475 LearningRate 0.0034 Epoch: 21 Global Step: 109990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:16:11,180-Speed 3432.06 samples/sec Loss 1.9541 LearningRate 0.0034 Epoch: 21 Global Step: 110000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:16:54,079-[lfw][110000]XNorm: 22.293777 Training: 2022-01-20 04:16:54,080-[lfw][110000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-20 04:16:54,080-[lfw][110000]Accuracy-Highest: 0.99833 Training: 2022-01-20 04:17:43,891-[cfp_fp][110000]XNorm: 21.823405 Training: 2022-01-20 04:17:43,892-[cfp_fp][110000]Accuracy-Flip: 0.98800+-0.00430 Training: 2022-01-20 04:17:43,892-[cfp_fp][110000]Accuracy-Highest: 0.98929 Training: 2022-01-20 04:18:26,898-[agedb_30][110000]XNorm: 22.548703 Training: 2022-01-20 04:18:26,899-[agedb_30][110000]Accuracy-Flip: 0.98367+-0.00623 Training: 2022-01-20 04:18:26,900-[agedb_30][110000]Accuracy-Highest: 0.98433 Training: 2022-01-20 04:18:29,889-Speed 73.82 samples/sec Loss 2.0083 LearningRate 0.0034 Epoch: 21 Global Step: 110010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:18:32,900-Speed 3401.76 samples/sec Loss 1.9286 LearningRate 0.0034 Epoch: 21 Global Step: 110020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:18:35,936-Speed 3373.93 samples/sec Loss 1.9210 LearningRate 0.0034 Epoch: 21 Global Step: 110030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:18:38,923-Speed 3429.35 samples/sec Loss 2.0029 LearningRate 0.0034 Epoch: 21 Global Step: 110040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:18:41,920-Speed 3417.66 samples/sec Loss 2.0826 LearningRate 0.0034 Epoch: 21 Global Step: 110050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:18:44,908-Speed 3428.17 samples/sec Loss 1.9592 LearningRate 0.0034 Epoch: 21 Global Step: 110060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:18:47,888-Speed 3438.08 samples/sec Loss 2.0227 LearningRate 0.0034 Epoch: 21 Global Step: 110070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:18:50,924-Speed 3372.88 samples/sec Loss 2.1018 LearningRate 0.0034 Epoch: 21 Global Step: 110080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:18:53,982-Speed 3350.33 samples/sec Loss 1.9785 LearningRate 0.0033 Epoch: 21 Global Step: 110090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:18:56,999-Speed 3394.36 samples/sec Loss 2.0582 LearningRate 0.0033 Epoch: 21 Global Step: 110100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:00,127-Speed 3274.41 samples/sec Loss 1.9451 LearningRate 0.0033 Epoch: 21 Global Step: 110110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:03,166-Speed 3370.47 samples/sec Loss 2.0258 LearningRate 0.0033 Epoch: 21 Global Step: 110120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:06,272-Speed 3298.29 samples/sec Loss 2.0449 LearningRate 0.0033 Epoch: 21 Global Step: 110130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:09,254-Speed 3434.31 samples/sec Loss 2.1001 LearningRate 0.0033 Epoch: 21 Global Step: 110140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:19:12,243-Speed 3426.97 samples/sec Loss 1.9732 LearningRate 0.0033 Epoch: 21 Global Step: 110150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:19:15,257-Speed 3398.31 samples/sec Loss 1.9106 LearningRate 0.0033 Epoch: 21 Global Step: 110160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:19:18,287-Speed 3381.21 samples/sec Loss 1.9778 LearningRate 0.0033 Epoch: 21 Global Step: 110170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:19:21,343-Speed 3350.80 samples/sec Loss 2.0342 LearningRate 0.0033 Epoch: 21 Global Step: 110180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:24,327-Speed 3432.70 samples/sec Loss 1.9739 LearningRate 0.0033 Epoch: 21 Global Step: 110190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:27,316-Speed 3427.30 samples/sec Loss 2.0099 LearningRate 0.0033 Epoch: 21 Global Step: 110200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:30,342-Speed 3383.84 samples/sec Loss 1.9463 LearningRate 0.0033 Epoch: 21 Global Step: 110210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:33,327-Speed 3431.64 samples/sec Loss 1.9890 LearningRate 0.0033 Epoch: 21 Global Step: 110220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:36,306-Speed 3438.78 samples/sec Loss 1.9397 LearningRate 0.0033 Epoch: 21 Global Step: 110230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:39,279-Speed 3445.62 samples/sec Loss 1.9832 LearningRate 0.0033 Epoch: 21 Global Step: 110240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:42,308-Speed 3381.88 samples/sec Loss 2.0073 LearningRate 0.0033 Epoch: 21 Global Step: 110250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:45,437-Speed 3273.04 samples/sec Loss 2.0299 LearningRate 0.0033 Epoch: 21 Global Step: 110260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:48,515-Speed 3327.27 samples/sec Loss 1.8520 LearningRate 0.0033 Epoch: 21 Global Step: 110270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:19:51,492-Speed 3440.80 samples/sec Loss 2.0772 LearningRate 0.0033 Epoch: 21 Global Step: 110280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:19:54,497-Speed 3408.50 samples/sec Loss 2.0233 LearningRate 0.0033 Epoch: 21 Global Step: 110290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:19:57,504-Speed 3406.70 samples/sec Loss 1.9666 LearningRate 0.0033 Epoch: 21 Global Step: 110300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:20:00,482-Speed 3439.03 samples/sec Loss 2.0070 LearningRate 0.0033 Epoch: 21 Global Step: 110310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:20:03,460-Speed 3440.21 samples/sec Loss 2.0207 LearningRate 0.0033 Epoch: 21 Global Step: 110320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:20:06,444-Speed 3431.63 samples/sec Loss 2.1259 LearningRate 0.0033 Epoch: 21 Global Step: 110330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:20:09,468-Speed 3388.19 samples/sec Loss 1.9513 LearningRate 0.0032 Epoch: 21 Global Step: 110340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:20:12,443-Speed 3442.00 samples/sec Loss 1.8556 LearningRate 0.0032 Epoch: 21 Global Step: 110350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:20:15,399-Speed 3465.59 samples/sec Loss 1.9359 LearningRate 0.0032 Epoch: 21 Global Step: 110360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:20:18,466-Speed 3339.47 samples/sec Loss 2.1237 LearningRate 0.0032 Epoch: 21 Global Step: 110370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:20:21,446-Speed 3437.40 samples/sec Loss 2.0745 LearningRate 0.0032 Epoch: 21 Global Step: 110380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:20:24,403-Speed 3463.50 samples/sec Loss 1.9769 LearningRate 0.0032 Epoch: 21 Global Step: 110390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:20:27,380-Speed 3440.76 samples/sec Loss 2.0464 LearningRate 0.0032 Epoch: 21 Global Step: 110400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:20:30,365-Speed 3430.63 samples/sec Loss 1.9837 LearningRate 0.0032 Epoch: 21 Global Step: 110410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:20:33,372-Speed 3406.51 samples/sec Loss 2.0454 LearningRate 0.0032 Epoch: 21 Global Step: 110420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:20:36,356-Speed 3432.61 samples/sec Loss 1.9813 LearningRate 0.0032 Epoch: 21 Global Step: 110430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:20:39,345-Speed 3427.60 samples/sec Loss 1.9767 LearningRate 0.0032 Epoch: 21 Global Step: 110440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:20:42,328-Speed 3433.17 samples/sec Loss 2.1161 LearningRate 0.0032 Epoch: 21 Global Step: 110450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:20:45,323-Speed 3419.64 samples/sec Loss 1.9937 LearningRate 0.0032 Epoch: 21 Global Step: 110460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:20:48,296-Speed 3446.13 samples/sec Loss 2.0765 LearningRate 0.0032 Epoch: 21 Global Step: 110470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:20:51,282-Speed 3430.04 samples/sec Loss 1.9391 LearningRate 0.0032 Epoch: 21 Global Step: 110480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:20:54,264-Speed 3435.77 samples/sec Loss 1.9240 LearningRate 0.0032 Epoch: 21 Global Step: 110490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:20:57,247-Speed 3433.44 samples/sec Loss 2.0439 LearningRate 0.0032 Epoch: 21 Global Step: 110500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:00,224-Speed 3440.07 samples/sec Loss 2.0631 LearningRate 0.0032 Epoch: 21 Global Step: 110510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:03,203-Speed 3438.55 samples/sec Loss 1.9867 LearningRate 0.0032 Epoch: 21 Global Step: 110520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:06,203-Speed 3414.24 samples/sec Loss 1.9803 LearningRate 0.0032 Epoch: 21 Global Step: 110530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:09,195-Speed 3423.70 samples/sec Loss 2.0767 LearningRate 0.0032 Epoch: 21 Global Step: 110540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:12,178-Speed 3433.24 samples/sec Loss 2.1006 LearningRate 0.0032 Epoch: 21 Global Step: 110550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:15,184-Speed 3407.08 samples/sec Loss 1.8671 LearningRate 0.0032 Epoch: 21 Global Step: 110560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:18,159-Speed 3444.00 samples/sec Loss 1.9774 LearningRate 0.0032 Epoch: 21 Global Step: 110570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:21,191-Speed 3377.13 samples/sec Loss 2.0384 LearningRate 0.0032 Epoch: 21 Global Step: 110580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:24,158-Speed 3453.57 samples/sec Loss 1.9498 LearningRate 0.0031 Epoch: 21 Global Step: 110590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:27,144-Speed 3429.28 samples/sec Loss 1.9932 LearningRate 0.0031 Epoch: 21 Global Step: 110600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:30,125-Speed 3437.42 samples/sec Loss 2.0239 LearningRate 0.0031 Epoch: 21 Global Step: 110610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:33,102-Speed 3441.03 samples/sec Loss 1.9107 LearningRate 0.0031 Epoch: 21 Global Step: 110620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:36,078-Speed 3441.38 samples/sec Loss 2.1166 LearningRate 0.0031 Epoch: 21 Global Step: 110630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:39,056-Speed 3439.21 samples/sec Loss 2.1163 LearningRate 0.0031 Epoch: 21 Global Step: 110640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:42,047-Speed 3424.91 samples/sec Loss 2.0827 LearningRate 0.0031 Epoch: 21 Global Step: 110650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:45,043-Speed 3419.05 samples/sec Loss 1.9961 LearningRate 0.0031 Epoch: 21 Global Step: 110660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:48,060-Speed 3394.84 samples/sec Loss 2.0126 LearningRate 0.0031 Epoch: 21 Global Step: 110670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:51,131-Speed 3334.41 samples/sec Loss 1.9550 LearningRate 0.0031 Epoch: 21 Global Step: 110680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:54,097-Speed 3453.93 samples/sec Loss 2.0142 LearningRate 0.0031 Epoch: 21 Global Step: 110690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:21:57,060-Speed 3457.38 samples/sec Loss 1.9746 LearningRate 0.0031 Epoch: 21 Global Step: 110700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:22:00,041-Speed 3436.51 samples/sec Loss 1.9508 LearningRate 0.0031 Epoch: 21 Global Step: 110710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:22:03,068-Speed 3383.46 samples/sec Loss 1.9729 LearningRate 0.0031 Epoch: 21 Global Step: 110720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:22:06,042-Speed 3443.63 samples/sec Loss 2.0882 LearningRate 0.0031 Epoch: 21 Global Step: 110730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:22:09,018-Speed 3442.09 samples/sec Loss 2.0111 LearningRate 0.0031 Epoch: 21 Global Step: 110740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:22:11,993-Speed 3443.34 samples/sec Loss 1.9843 LearningRate 0.0031 Epoch: 21 Global Step: 110750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:22:14,977-Speed 3432.07 samples/sec Loss 1.9230 LearningRate 0.0031 Epoch: 21 Global Step: 110760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:22:17,952-Speed 3442.83 samples/sec Loss 1.9056 LearningRate 0.0031 Epoch: 21 Global Step: 110770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:22:20,929-Speed 3440.57 samples/sec Loss 1.9919 LearningRate 0.0031 Epoch: 21 Global Step: 110780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:22:24,018-Speed 3316.16 samples/sec Loss 2.0074 LearningRate 0.0031 Epoch: 21 Global Step: 110790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-20 04:22:27,001-Speed 3433.22 samples/sec Loss 2.0174 LearningRate 0.0031 Epoch: 21 Global Step: 110800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:22:29,984-Speed 3434.48 samples/sec Loss 2.0226 LearningRate 0.0031 Epoch: 21 Global Step: 110810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:22:32,965-Speed 3435.94 samples/sec Loss 2.0841 LearningRate 0.0031 Epoch: 21 Global Step: 110820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:22:35,948-Speed 3433.35 samples/sec Loss 1.9219 LearningRate 0.0031 Epoch: 21 Global Step: 110830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:22:38,925-Speed 3440.38 samples/sec Loss 1.9530 LearningRate 0.0030 Epoch: 21 Global Step: 110840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:22:41,904-Speed 3438.69 samples/sec Loss 2.0325 LearningRate 0.0030 Epoch: 21 Global Step: 110850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:22:44,882-Speed 3440.89 samples/sec Loss 2.0609 LearningRate 0.0030 Epoch: 21 Global Step: 110860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:22:47,898-Speed 3395.68 samples/sec Loss 2.0342 LearningRate 0.0030 Epoch: 21 Global Step: 110870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:22:50,989-Speed 3313.84 samples/sec Loss 2.0505 LearningRate 0.0030 Epoch: 21 Global Step: 110880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:22:54,064-Speed 3331.48 samples/sec Loss 2.0536 LearningRate 0.0030 Epoch: 21 Global Step: 110890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:22:57,102-Speed 3371.09 samples/sec Loss 1.9820 LearningRate 0.0030 Epoch: 21 Global Step: 110900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:23:00,131-Speed 3381.79 samples/sec Loss 1.9489 LearningRate 0.0030 Epoch: 21 Global Step: 110910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:03,112-Speed 3435.49 samples/sec Loss 2.1182 LearningRate 0.0030 Epoch: 21 Global Step: 110920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:06,102-Speed 3425.68 samples/sec Loss 2.0021 LearningRate 0.0030 Epoch: 21 Global Step: 110930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:09,080-Speed 3440.24 samples/sec Loss 1.8827 LearningRate 0.0030 Epoch: 21 Global Step: 110940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:12,073-Speed 3422.16 samples/sec Loss 1.9973 LearningRate 0.0030 Epoch: 21 Global Step: 110950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:15,061-Speed 3428.09 samples/sec Loss 2.0217 LearningRate 0.0030 Epoch: 21 Global Step: 110960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:18,040-Speed 3437.82 samples/sec Loss 1.8875 LearningRate 0.0030 Epoch: 21 Global Step: 110970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:21,040-Speed 3414.29 samples/sec Loss 2.0510 LearningRate 0.0030 Epoch: 21 Global Step: 110980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:24,018-Speed 3440.08 samples/sec Loss 1.8871 LearningRate 0.0030 Epoch: 21 Global Step: 110990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:27,001-Speed 3433.46 samples/sec Loss 1.9598 LearningRate 0.0030 Epoch: 21 Global Step: 111000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:29,987-Speed 3430.86 samples/sec Loss 2.0175 LearningRate 0.0030 Epoch: 21 Global Step: 111010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:32,994-Speed 3405.60 samples/sec Loss 1.8903 LearningRate 0.0030 Epoch: 21 Global Step: 111020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:36,003-Speed 3404.79 samples/sec Loss 2.0040 LearningRate 0.0030 Epoch: 21 Global Step: 111030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:38,983-Speed 3436.43 samples/sec Loss 1.9644 LearningRate 0.0030 Epoch: 21 Global Step: 111040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:41,998-Speed 3397.96 samples/sec Loss 1.9994 LearningRate 0.0030 Epoch: 21 Global Step: 111050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:45,046-Speed 3359.80 samples/sec Loss 1.9294 LearningRate 0.0030 Epoch: 21 Global Step: 111060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:48,188-Speed 3260.50 samples/sec Loss 1.9694 LearningRate 0.0030 Epoch: 21 Global Step: 111070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:51,281-Speed 3311.02 samples/sec Loss 1.9831 LearningRate 0.0030 Epoch: 21 Global Step: 111080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:54,281-Speed 3414.98 samples/sec Loss 1.9378 LearningRate 0.0030 Epoch: 21 Global Step: 111090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:23:57,355-Speed 3331.89 samples/sec Loss 1.9067 LearningRate 0.0029 Epoch: 21 Global Step: 111100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:24:00,345-Speed 3426.26 samples/sec Loss 1.9291 LearningRate 0.0029 Epoch: 21 Global Step: 111110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:24:03,305-Speed 3460.48 samples/sec Loss 2.0586 LearningRate 0.0029 Epoch: 21 Global Step: 111120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:24:06,287-Speed 3434.68 samples/sec Loss 1.9790 LearningRate 0.0029 Epoch: 21 Global Step: 111130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:24:09,290-Speed 3411.13 samples/sec Loss 1.9313 LearningRate 0.0029 Epoch: 21 Global Step: 111140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:24:12,268-Speed 3439.90 samples/sec Loss 2.0102 LearningRate 0.0029 Epoch: 21 Global Step: 111150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:24:15,268-Speed 3414.05 samples/sec Loss 1.9972 LearningRate 0.0029 Epoch: 21 Global Step: 111160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:24:18,247-Speed 3438.39 samples/sec Loss 2.0005 LearningRate 0.0029 Epoch: 21 Global Step: 111170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:24:21,269-Speed 3389.20 samples/sec Loss 2.0077 LearningRate 0.0029 Epoch: 21 Global Step: 111180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:24:24,256-Speed 3429.78 samples/sec Loss 2.0196 LearningRate 0.0029 Epoch: 21 Global Step: 111190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:24:27,332-Speed 3329.73 samples/sec Loss 2.0089 LearningRate 0.0029 Epoch: 21 Global Step: 111200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:24:30,385-Speed 3354.95 samples/sec Loss 1.8794 LearningRate 0.0029 Epoch: 21 Global Step: 111210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:24:33,382-Speed 3417.01 samples/sec Loss 1.9139 LearningRate 0.0029 Epoch: 21 Global Step: 111220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:24:36,420-Speed 3371.64 samples/sec Loss 1.9502 LearningRate 0.0029 Epoch: 21 Global Step: 111230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:24:39,472-Speed 3356.04 samples/sec Loss 2.0688 LearningRate 0.0029 Epoch: 21 Global Step: 111240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:24:42,513-Speed 3368.32 samples/sec Loss 2.0234 LearningRate 0.0029 Epoch: 21 Global Step: 111250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:24:45,509-Speed 3418.57 samples/sec Loss 2.0191 LearningRate 0.0029 Epoch: 21 Global Step: 111260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:24:48,568-Speed 3349.43 samples/sec Loss 1.9214 LearningRate 0.0029 Epoch: 21 Global Step: 111270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:25:03,388-Speed 690.98 samples/sec Loss 1.8364 LearningRate 0.0029 Epoch: 22 Global Step: 111280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:25:06,445-Speed 3351.71 samples/sec Loss 1.4629 LearningRate 0.0029 Epoch: 22 Global Step: 111290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:25:09,472-Speed 3382.82 samples/sec Loss 1.4114 LearningRate 0.0029 Epoch: 22 Global Step: 111300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:25:12,474-Speed 3412.61 samples/sec Loss 1.4294 LearningRate 0.0029 Epoch: 22 Global Step: 111310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:25:15,472-Speed 3417.00 samples/sec Loss 1.4083 LearningRate 0.0029 Epoch: 22 Global Step: 111320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-20 04:25:18,450-Speed 3438.99 samples/sec Loss 1.3769 LearningRate 0.0029 Epoch: 22 Global Step: 111330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:25:21,418-Speed 3450.60 samples/sec Loss 1.4211 LearningRate 0.0029 Epoch: 22 Global Step: 111340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:25:24,401-Speed 3434.26 samples/sec Loss 1.3754 LearningRate 0.0029 Epoch: 22 Global Step: 111350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:25:27,520-Speed 3283.94 samples/sec Loss 1.4197 LearningRate 0.0028 Epoch: 22 Global Step: 111360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:25:30,641-Speed 3282.30 samples/sec Loss 1.4565 LearningRate 0.0028 Epoch: 22 Global Step: 111370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:25:33,664-Speed 3387.92 samples/sec Loss 1.3921 LearningRate 0.0028 Epoch: 22 Global Step: 111380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:25:36,654-Speed 3425.96 samples/sec Loss 1.4209 LearningRate 0.0028 Epoch: 22 Global Step: 111390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:25:39,681-Speed 3384.62 samples/sec Loss 1.4190 LearningRate 0.0028 Epoch: 22 Global Step: 111400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:25:42,806-Speed 3276.62 samples/sec Loss 1.4814 LearningRate 0.0028 Epoch: 22 Global Step: 111410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:25:45,828-Speed 3389.03 samples/sec Loss 1.3910 LearningRate 0.0028 Epoch: 22 Global Step: 111420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:25:48,837-Speed 3405.35 samples/sec Loss 1.4979 LearningRate 0.0028 Epoch: 22 Global Step: 111430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:25:51,817-Speed 3437.39 samples/sec Loss 1.5054 LearningRate 0.0028 Epoch: 22 Global Step: 111440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:25:54,941-Speed 3277.99 samples/sec Loss 1.3471 LearningRate 0.0028 Epoch: 22 Global Step: 111450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:25:57,960-Speed 3392.88 samples/sec Loss 1.5138 LearningRate 0.0028 Epoch: 22 Global Step: 111460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:26:00,960-Speed 3414.19 samples/sec Loss 1.3364 LearningRate 0.0028 Epoch: 22 Global Step: 111470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:26:03,930-Speed 3449.27 samples/sec Loss 1.4454 LearningRate 0.0028 Epoch: 22 Global Step: 111480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:06,921-Speed 3424.36 samples/sec Loss 1.4265 LearningRate 0.0028 Epoch: 22 Global Step: 111490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:09,908-Speed 3433.62 samples/sec Loss 1.4474 LearningRate 0.0028 Epoch: 22 Global Step: 111500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:12,893-Speed 3431.45 samples/sec Loss 1.4370 LearningRate 0.0028 Epoch: 22 Global Step: 111510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:15,878-Speed 3430.99 samples/sec Loss 1.4661 LearningRate 0.0028 Epoch: 22 Global Step: 111520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:18,911-Speed 3376.96 samples/sec Loss 1.3703 LearningRate 0.0028 Epoch: 22 Global Step: 111530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:21,957-Speed 3362.98 samples/sec Loss 1.3969 LearningRate 0.0028 Epoch: 22 Global Step: 111540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:25,008-Speed 3357.19 samples/sec Loss 1.4175 LearningRate 0.0028 Epoch: 22 Global Step: 111550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:27,997-Speed 3426.63 samples/sec Loss 1.4844 LearningRate 0.0028 Epoch: 22 Global Step: 111560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:31,004-Speed 3406.57 samples/sec Loss 1.4302 LearningRate 0.0028 Epoch: 22 Global Step: 111570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:34,055-Speed 3357.05 samples/sec Loss 1.4677 LearningRate 0.0028 Epoch: 22 Global Step: 111580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:26:37,067-Speed 3400.84 samples/sec Loss 1.4937 LearningRate 0.0028 Epoch: 22 Global Step: 111590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:26:40,097-Speed 3380.75 samples/sec Loss 1.4434 LearningRate 0.0028 Epoch: 22 Global Step: 111600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:26:43,123-Speed 3385.02 samples/sec Loss 1.4498 LearningRate 0.0028 Epoch: 22 Global Step: 111610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:46,107-Speed 3431.78 samples/sec Loss 1.4735 LearningRate 0.0028 Epoch: 22 Global Step: 111620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:49,103-Speed 3419.87 samples/sec Loss 1.4345 LearningRate 0.0027 Epoch: 22 Global Step: 111630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:52,094-Speed 3423.20 samples/sec Loss 1.4955 LearningRate 0.0027 Epoch: 22 Global Step: 111640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:55,078-Speed 3432.88 samples/sec Loss 1.4957 LearningRate 0.0027 Epoch: 22 Global Step: 111650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:26:58,057-Speed 3438.69 samples/sec Loss 1.5530 LearningRate 0.0027 Epoch: 22 Global Step: 111660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:27:01,075-Speed 3393.39 samples/sec Loss 1.4592 LearningRate 0.0027 Epoch: 22 Global Step: 111670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:27:04,071-Speed 3419.68 samples/sec Loss 1.4620 LearningRate 0.0027 Epoch: 22 Global Step: 111680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:27:07,154-Speed 3322.38 samples/sec Loss 1.4073 LearningRate 0.0027 Epoch: 22 Global Step: 111690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:27:10,139-Speed 3432.19 samples/sec Loss 1.4407 LearningRate 0.0027 Epoch: 22 Global Step: 111700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:27:13,227-Speed 3317.35 samples/sec Loss 1.5195 LearningRate 0.0027 Epoch: 22 Global Step: 111710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-20 04:27:16,230-Speed 3410.31 samples/sec Loss 1.4212 LearningRate 0.0027 Epoch: 22 Global Step: 111720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:27:19,237-Speed 3405.86 samples/sec Loss 1.5114 LearningRate 0.0027 Epoch: 22 Global Step: 111730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:27:22,221-Speed 3434.36 samples/sec Loss 1.5737 LearningRate 0.0027 Epoch: 22 Global Step: 111740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:27:25,271-Speed 3357.71 samples/sec Loss 1.4864 LearningRate 0.0027 Epoch: 22 Global Step: 111750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:27:28,252-Speed 3435.84 samples/sec Loss 1.5208 LearningRate 0.0027 Epoch: 22 Global Step: 111760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-20 04:27:31,252-Speed 3414.08 samples/sec Loss 1.4409 LearningRate 0.0027 Epoch: 22 Global Step: 111770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:27:34,237-Speed 3432.00 samples/sec Loss 1.4322 LearningRate 0.0027 Epoch: 22 Global Step: 111780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:27:37,218-Speed 3436.35 samples/sec Loss 1.5723 LearningRate 0.0027 Epoch: 22 Global Step: 111790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:27:40,185-Speed 3452.76 samples/sec Loss 1.4020 LearningRate 0.0027 Epoch: 22 Global Step: 111800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:27:43,168-Speed 3433.63 samples/sec Loss 1.4587 LearningRate 0.0027 Epoch: 22 Global Step: 111810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:27:46,159-Speed 3424.20 samples/sec Loss 1.4943 LearningRate 0.0027 Epoch: 22 Global Step: 111820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:27:49,141-Speed 3434.95 samples/sec Loss 1.3902 LearningRate 0.0027 Epoch: 22 Global Step: 111830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:27:52,130-Speed 3426.71 samples/sec Loss 1.5118 LearningRate 0.0027 Epoch: 22 Global Step: 111840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:27:55,105-Speed 3442.35 samples/sec Loss 1.5245 LearningRate 0.0027 Epoch: 22 Global Step: 111850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:27:58,085-Speed 3437.27 samples/sec Loss 1.4136 LearningRate 0.0027 Epoch: 22 Global Step: 111860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:28:01,072-Speed 3430.06 samples/sec Loss 1.4710 LearningRate 0.0027 Epoch: 22 Global Step: 111870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:28:04,058-Speed 3429.75 samples/sec Loss 1.4452 LearningRate 0.0027 Epoch: 22 Global Step: 111880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:28:07,049-Speed 3424.64 samples/sec Loss 1.4604 LearningRate 0.0027 Epoch: 22 Global Step: 111890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:28:10,044-Speed 3419.50 samples/sec Loss 1.4206 LearningRate 0.0026 Epoch: 22 Global Step: 111900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:28:13,045-Speed 3413.52 samples/sec Loss 1.4455 LearningRate 0.0026 Epoch: 22 Global Step: 111910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:28:16,091-Speed 3362.14 samples/sec Loss 1.4951 LearningRate 0.0026 Epoch: 22 Global Step: 111920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:28:19,109-Speed 3394.25 samples/sec Loss 1.4572 LearningRate 0.0026 Epoch: 22 Global Step: 111930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:28:22,097-Speed 3427.36 samples/sec Loss 1.4839 LearningRate 0.0026 Epoch: 22 Global Step: 111940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:28:25,079-Speed 3435.58 samples/sec Loss 1.5239 LearningRate 0.0026 Epoch: 22 Global Step: 111950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:28:28,075-Speed 3419.22 samples/sec Loss 1.5403 LearningRate 0.0026 Epoch: 22 Global Step: 111960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:28:31,064-Speed 3426.93 samples/sec Loss 1.5042 LearningRate 0.0026 Epoch: 22 Global Step: 111970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:28:34,045-Speed 3435.38 samples/sec Loss 1.4694 LearningRate 0.0026 Epoch: 22 Global Step: 111980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:28:37,027-Speed 3434.94 samples/sec Loss 1.4817 LearningRate 0.0026 Epoch: 22 Global Step: 111990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:28:40,016-Speed 3426.82 samples/sec Loss 1.4502 LearningRate 0.0026 Epoch: 22 Global Step: 112000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:29:22,843-[lfw][112000]XNorm: 22.387939 Training: 2022-01-20 04:29:22,843-[lfw][112000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-01-20 04:29:22,844-[lfw][112000]Accuracy-Highest: 0.99833 Training: 2022-01-20 04:30:12,700-[cfp_fp][112000]XNorm: 21.901736 Training: 2022-01-20 04:30:12,701-[cfp_fp][112000]Accuracy-Flip: 0.98900+-0.00362 Training: 2022-01-20 04:30:12,702-[cfp_fp][112000]Accuracy-Highest: 0.98929 Training: 2022-01-20 04:30:55,562-[agedb_30][112000]XNorm: 22.761641 Training: 2022-01-20 04:30:55,563-[agedb_30][112000]Accuracy-Flip: 0.98417+-0.00588 Training: 2022-01-20 04:30:55,564-[agedb_30][112000]Accuracy-Highest: 0.98433 Training: 2022-01-20 04:30:58,526-Speed 73.93 samples/sec Loss 1.4692 LearningRate 0.0026 Epoch: 22 Global Step: 112010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:01,500-Speed 3443.19 samples/sec Loss 1.4932 LearningRate 0.0026 Epoch: 22 Global Step: 112020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:04,470-Speed 3448.44 samples/sec Loss 1.4674 LearningRate 0.0026 Epoch: 22 Global Step: 112030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:07,445-Speed 3443.70 samples/sec Loss 1.4655 LearningRate 0.0026 Epoch: 22 Global Step: 112040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:10,429-Speed 3433.61 samples/sec Loss 1.4303 LearningRate 0.0026 Epoch: 22 Global Step: 112050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:13,402-Speed 3444.52 samples/sec Loss 1.3643 LearningRate 0.0026 Epoch: 22 Global Step: 112060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:16,383-Speed 3436.30 samples/sec Loss 1.4464 LearningRate 0.0026 Epoch: 22 Global Step: 112070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:19,393-Speed 3402.82 samples/sec Loss 1.4844 LearningRate 0.0026 Epoch: 22 Global Step: 112080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:22,381-Speed 3428.07 samples/sec Loss 1.5365 LearningRate 0.0026 Epoch: 22 Global Step: 112090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:25,394-Speed 3398.82 samples/sec Loss 1.4192 LearningRate 0.0026 Epoch: 22 Global Step: 112100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:28,377-Speed 3434.60 samples/sec Loss 1.4995 LearningRate 0.0026 Epoch: 22 Global Step: 112110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:31:31,357-Speed 3437.54 samples/sec Loss 1.4863 LearningRate 0.0026 Epoch: 22 Global Step: 112120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:31:34,340-Speed 3433.54 samples/sec Loss 1.5541 LearningRate 0.0026 Epoch: 22 Global Step: 112130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:31:37,335-Speed 3419.62 samples/sec Loss 1.5582 LearningRate 0.0026 Epoch: 22 Global Step: 112140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:31:40,312-Speed 3440.71 samples/sec Loss 1.4278 LearningRate 0.0026 Epoch: 22 Global Step: 112150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:31:43,276-Speed 3455.64 samples/sec Loss 1.4571 LearningRate 0.0026 Epoch: 22 Global Step: 112160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:46,254-Speed 3439.54 samples/sec Loss 1.5624 LearningRate 0.0026 Epoch: 22 Global Step: 112170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:49,230-Speed 3441.14 samples/sec Loss 1.4604 LearningRate 0.0025 Epoch: 22 Global Step: 112180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:52,212-Speed 3435.99 samples/sec Loss 1.5209 LearningRate 0.0025 Epoch: 22 Global Step: 112190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:55,190-Speed 3438.75 samples/sec Loss 1.5692 LearningRate 0.0025 Epoch: 22 Global Step: 112200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:31:58,168-Speed 3440.06 samples/sec Loss 1.5154 LearningRate 0.0025 Epoch: 22 Global Step: 112210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:01,156-Speed 3428.41 samples/sec Loss 1.4968 LearningRate 0.0025 Epoch: 22 Global Step: 112220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:04,133-Speed 3440.35 samples/sec Loss 1.4575 LearningRate 0.0025 Epoch: 22 Global Step: 112230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:07,114-Speed 3436.50 samples/sec Loss 1.5154 LearningRate 0.0025 Epoch: 22 Global Step: 112240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:10,118-Speed 3409.44 samples/sec Loss 1.5651 LearningRate 0.0025 Epoch: 22 Global Step: 112250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:13,106-Speed 3427.79 samples/sec Loss 1.4923 LearningRate 0.0025 Epoch: 22 Global Step: 112260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:32:16,085-Speed 3437.86 samples/sec Loss 1.5050 LearningRate 0.0025 Epoch: 22 Global Step: 112270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:32:19,062-Speed 3440.92 samples/sec Loss 1.4446 LearningRate 0.0025 Epoch: 22 Global Step: 112280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:32:22,058-Speed 3419.53 samples/sec Loss 1.4110 LearningRate 0.0025 Epoch: 22 Global Step: 112290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:32:25,035-Speed 3440.69 samples/sec Loss 1.5250 LearningRate 0.0025 Epoch: 22 Global Step: 112300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:32:28,074-Speed 3370.75 samples/sec Loss 1.5183 LearningRate 0.0025 Epoch: 22 Global Step: 112310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:32:31,084-Speed 3403.14 samples/sec Loss 1.4504 LearningRate 0.0025 Epoch: 22 Global Step: 112320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:32:34,225-Speed 3259.91 samples/sec Loss 1.5655 LearningRate 0.0025 Epoch: 22 Global Step: 112330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:32:37,218-Speed 3423.78 samples/sec Loss 1.4175 LearningRate 0.0025 Epoch: 22 Global Step: 112340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:40,323-Speed 3297.99 samples/sec Loss 1.5731 LearningRate 0.0025 Epoch: 22 Global Step: 112350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:43,494-Speed 3230.06 samples/sec Loss 1.5715 LearningRate 0.0025 Epoch: 22 Global Step: 112360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:46,531-Speed 3372.57 samples/sec Loss 1.4920 LearningRate 0.0025 Epoch: 22 Global Step: 112370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:49,619-Speed 3317.18 samples/sec Loss 1.4667 LearningRate 0.0025 Epoch: 22 Global Step: 112380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:52,701-Speed 3323.66 samples/sec Loss 1.4539 LearningRate 0.0025 Epoch: 22 Global Step: 112390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:55,756-Speed 3352.97 samples/sec Loss 1.4430 LearningRate 0.0025 Epoch: 22 Global Step: 112400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:32:58,742-Speed 3429.54 samples/sec Loss 1.5049 LearningRate 0.0025 Epoch: 22 Global Step: 112410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:01,766-Speed 3387.20 samples/sec Loss 1.5456 LearningRate 0.0025 Epoch: 22 Global Step: 112420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:04,899-Speed 3268.95 samples/sec Loss 1.4719 LearningRate 0.0025 Epoch: 22 Global Step: 112430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:07,945-Speed 3363.20 samples/sec Loss 1.5050 LearningRate 0.0025 Epoch: 22 Global Step: 112440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:33:10,991-Speed 3362.62 samples/sec Loss 1.3800 LearningRate 0.0025 Epoch: 22 Global Step: 112450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:14,025-Speed 3374.92 samples/sec Loss 1.4720 LearningRate 0.0024 Epoch: 22 Global Step: 112460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:17,045-Speed 3392.35 samples/sec Loss 1.5360 LearningRate 0.0024 Epoch: 22 Global Step: 112470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:20,108-Speed 3344.46 samples/sec Loss 1.5055 LearningRate 0.0024 Epoch: 22 Global Step: 112480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:23,161-Speed 3354.82 samples/sec Loss 1.5008 LearningRate 0.0024 Epoch: 22 Global Step: 112490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:26,148-Speed 3429.10 samples/sec Loss 1.5931 LearningRate 0.0024 Epoch: 22 Global Step: 112500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:29,145-Speed 3417.92 samples/sec Loss 1.4904 LearningRate 0.0024 Epoch: 22 Global Step: 112510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:32,137-Speed 3422.56 samples/sec Loss 1.4890 LearningRate 0.0024 Epoch: 22 Global Step: 112520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:35,151-Speed 3398.67 samples/sec Loss 1.6856 LearningRate 0.0024 Epoch: 22 Global Step: 112530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:38,147-Speed 3419.47 samples/sec Loss 1.5381 LearningRate 0.0024 Epoch: 22 Global Step: 112540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:41,117-Speed 3447.46 samples/sec Loss 1.4889 LearningRate 0.0024 Epoch: 22 Global Step: 112550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:44,115-Speed 3417.17 samples/sec Loss 1.4999 LearningRate 0.0024 Epoch: 22 Global Step: 112560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:47,091-Speed 3442.40 samples/sec Loss 1.4655 LearningRate 0.0024 Epoch: 22 Global Step: 112570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:50,154-Speed 3343.19 samples/sec Loss 1.4702 LearningRate 0.0024 Epoch: 22 Global Step: 112580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:53,165-Speed 3402.32 samples/sec Loss 1.4872 LearningRate 0.0024 Epoch: 22 Global Step: 112590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:56,238-Speed 3333.65 samples/sec Loss 1.4322 LearningRate 0.0024 Epoch: 22 Global Step: 112600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:33:59,330-Speed 3312.49 samples/sec Loss 1.3898 LearningRate 0.0024 Epoch: 22 Global Step: 112610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:02,394-Speed 3342.63 samples/sec Loss 1.5880 LearningRate 0.0024 Epoch: 22 Global Step: 112620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:05,419-Speed 3385.30 samples/sec Loss 1.5720 LearningRate 0.0024 Epoch: 22 Global Step: 112630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:08,397-Speed 3439.55 samples/sec Loss 1.5884 LearningRate 0.0024 Epoch: 22 Global Step: 112640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:11,382-Speed 3431.39 samples/sec Loss 1.4706 LearningRate 0.0024 Epoch: 22 Global Step: 112650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:34:14,360-Speed 3440.02 samples/sec Loss 1.4376 LearningRate 0.0024 Epoch: 22 Global Step: 112660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:34:17,338-Speed 3439.56 samples/sec Loss 1.5229 LearningRate 0.0024 Epoch: 22 Global Step: 112670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:34:20,368-Speed 3380.00 samples/sec Loss 1.5680 LearningRate 0.0024 Epoch: 22 Global Step: 112680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:23,434-Speed 3341.39 samples/sec Loss 1.4833 LearningRate 0.0024 Epoch: 22 Global Step: 112690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:26,523-Speed 3315.94 samples/sec Loss 1.5475 LearningRate 0.0024 Epoch: 22 Global Step: 112700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:29,507-Speed 3431.79 samples/sec Loss 1.4646 LearningRate 0.0024 Epoch: 22 Global Step: 112710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:32,531-Speed 3388.00 samples/sec Loss 1.5335 LearningRate 0.0024 Epoch: 22 Global Step: 112720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:35,563-Speed 3377.91 samples/sec Loss 1.4451 LearningRate 0.0024 Epoch: 22 Global Step: 112730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:38,561-Speed 3417.42 samples/sec Loss 1.5120 LearningRate 0.0024 Epoch: 22 Global Step: 112740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:41,730-Speed 3232.45 samples/sec Loss 1.4886 LearningRate 0.0023 Epoch: 22 Global Step: 112750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:44,788-Speed 3349.58 samples/sec Loss 1.5387 LearningRate 0.0023 Epoch: 22 Global Step: 112760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:47,772-Speed 3433.68 samples/sec Loss 1.5706 LearningRate 0.0023 Epoch: 22 Global Step: 112770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:50,799-Speed 3383.65 samples/sec Loss 1.4497 LearningRate 0.0023 Epoch: 22 Global Step: 112780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:53,783-Speed 3432.61 samples/sec Loss 1.5369 LearningRate 0.0023 Epoch: 22 Global Step: 112790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:56,881-Speed 3305.93 samples/sec Loss 1.5145 LearningRate 0.0023 Epoch: 22 Global Step: 112800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:34:59,894-Speed 3399.96 samples/sec Loss 1.4803 LearningRate 0.0023 Epoch: 22 Global Step: 112810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:02,918-Speed 3386.35 samples/sec Loss 1.4088 LearningRate 0.0023 Epoch: 22 Global Step: 112820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:05,896-Speed 3439.45 samples/sec Loss 1.4543 LearningRate 0.0023 Epoch: 22 Global Step: 112830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:09,008-Speed 3292.71 samples/sec Loss 1.4918 LearningRate 0.0023 Epoch: 22 Global Step: 112840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:12,102-Speed 3310.55 samples/sec Loss 1.5372 LearningRate 0.0023 Epoch: 22 Global Step: 112850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:15,165-Speed 3343.94 samples/sec Loss 1.5046 LearningRate 0.0023 Epoch: 22 Global Step: 112860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:18,191-Speed 3384.37 samples/sec Loss 1.4679 LearningRate 0.0023 Epoch: 22 Global Step: 112870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:21,373-Speed 3218.69 samples/sec Loss 1.5145 LearningRate 0.0023 Epoch: 22 Global Step: 112880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:35:24,462-Speed 3316.18 samples/sec Loss 1.4304 LearningRate 0.0023 Epoch: 22 Global Step: 112890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:27,506-Speed 3364.71 samples/sec Loss 1.5454 LearningRate 0.0023 Epoch: 22 Global Step: 112900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:30,585-Speed 3326.20 samples/sec Loss 1.5245 LearningRate 0.0023 Epoch: 22 Global Step: 112910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:33,570-Speed 3431.84 samples/sec Loss 1.5313 LearningRate 0.0023 Epoch: 22 Global Step: 112920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:36,565-Speed 3419.79 samples/sec Loss 1.5510 LearningRate 0.0023 Epoch: 22 Global Step: 112930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:39,541-Speed 3441.64 samples/sec Loss 1.5410 LearningRate 0.0023 Epoch: 22 Global Step: 112940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:42,521-Speed 3437.65 samples/sec Loss 1.4525 LearningRate 0.0023 Epoch: 22 Global Step: 112950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:45,502-Speed 3436.37 samples/sec Loss 1.5029 LearningRate 0.0023 Epoch: 22 Global Step: 112960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:48,479-Speed 3440.30 samples/sec Loss 1.4783 LearningRate 0.0023 Epoch: 22 Global Step: 112970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:51,463-Speed 3432.86 samples/sec Loss 1.5672 LearningRate 0.0023 Epoch: 22 Global Step: 112980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:35:54,453-Speed 3424.98 samples/sec Loss 1.4391 LearningRate 0.0023 Epoch: 22 Global Step: 112990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:35:57,479-Speed 3384.91 samples/sec Loss 1.4612 LearningRate 0.0023 Epoch: 22 Global Step: 113000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:36:00,552-Speed 3333.26 samples/sec Loss 1.5912 LearningRate 0.0023 Epoch: 22 Global Step: 113010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:36:03,526-Speed 3445.17 samples/sec Loss 1.5658 LearningRate 0.0023 Epoch: 22 Global Step: 113020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:06,509-Speed 3433.90 samples/sec Loss 1.5357 LearningRate 0.0023 Epoch: 22 Global Step: 113030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:09,577-Speed 3338.69 samples/sec Loss 1.5326 LearningRate 0.0022 Epoch: 22 Global Step: 113040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:12,586-Speed 3403.89 samples/sec Loss 1.4869 LearningRate 0.0022 Epoch: 22 Global Step: 113050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:15,563-Speed 3440.54 samples/sec Loss 1.5006 LearningRate 0.0022 Epoch: 22 Global Step: 113060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:18,544-Speed 3435.68 samples/sec Loss 1.5531 LearningRate 0.0022 Epoch: 22 Global Step: 113070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:21,645-Speed 3303.04 samples/sec Loss 1.5719 LearningRate 0.0022 Epoch: 22 Global Step: 113080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:24,622-Speed 3439.79 samples/sec Loss 1.6057 LearningRate 0.0022 Epoch: 22 Global Step: 113090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:27,606-Speed 3434.01 samples/sec Loss 1.5951 LearningRate 0.0022 Epoch: 22 Global Step: 113100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:30,593-Speed 3428.61 samples/sec Loss 1.6131 LearningRate 0.0022 Epoch: 22 Global Step: 113110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:33,578-Speed 3432.00 samples/sec Loss 1.4879 LearningRate 0.0022 Epoch: 22 Global Step: 113120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:36:36,571-Speed 3421.99 samples/sec Loss 1.5564 LearningRate 0.0022 Epoch: 22 Global Step: 113130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:36:39,546-Speed 3442.16 samples/sec Loss 1.5960 LearningRate 0.0022 Epoch: 22 Global Step: 113140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:42,540-Speed 3421.06 samples/sec Loss 1.5092 LearningRate 0.0022 Epoch: 22 Global Step: 113150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:45,530-Speed 3426.37 samples/sec Loss 1.5130 LearningRate 0.0022 Epoch: 22 Global Step: 113160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:48,512-Speed 3433.99 samples/sec Loss 1.3912 LearningRate 0.0022 Epoch: 22 Global Step: 113170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:51,498-Speed 3431.13 samples/sec Loss 1.4634 LearningRate 0.0022 Epoch: 22 Global Step: 113180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:36:54,465-Speed 3451.57 samples/sec Loss 1.5420 LearningRate 0.0022 Epoch: 22 Global Step: 113190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:36:57,486-Speed 3391.01 samples/sec Loss 1.5700 LearningRate 0.0022 Epoch: 22 Global Step: 113200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:37:00,479-Speed 3421.98 samples/sec Loss 1.5420 LearningRate 0.0022 Epoch: 22 Global Step: 113210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:37:03,483-Speed 3410.50 samples/sec Loss 1.4518 LearningRate 0.0022 Epoch: 22 Global Step: 113220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:37:06,486-Speed 3410.19 samples/sec Loss 1.4456 LearningRate 0.0022 Epoch: 22 Global Step: 113230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:37:09,549-Speed 3344.50 samples/sec Loss 1.5029 LearningRate 0.0022 Epoch: 22 Global Step: 113240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:37:12,551-Speed 3411.61 samples/sec Loss 1.5074 LearningRate 0.0022 Epoch: 22 Global Step: 113250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:37:15,586-Speed 3375.02 samples/sec Loss 1.5457 LearningRate 0.0022 Epoch: 22 Global Step: 113260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:37:18,600-Speed 3397.63 samples/sec Loss 1.4475 LearningRate 0.0022 Epoch: 22 Global Step: 113270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:37:21,607-Speed 3406.22 samples/sec Loss 1.4583 LearningRate 0.0022 Epoch: 22 Global Step: 113280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:37:24,619-Speed 3401.04 samples/sec Loss 1.5013 LearningRate 0.0022 Epoch: 22 Global Step: 113290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:37:27,662-Speed 3366.39 samples/sec Loss 1.5143 LearningRate 0.0022 Epoch: 22 Global Step: 113300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:37:30,645-Speed 3433.17 samples/sec Loss 1.5848 LearningRate 0.0022 Epoch: 22 Global Step: 113310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:37:33,727-Speed 3323.90 samples/sec Loss 1.4683 LearningRate 0.0022 Epoch: 22 Global Step: 113320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:37:36,741-Speed 3397.66 samples/sec Loss 1.4775 LearningRate 0.0022 Epoch: 22 Global Step: 113330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:37:39,744-Speed 3411.29 samples/sec Loss 1.5236 LearningRate 0.0021 Epoch: 22 Global Step: 113340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:37:42,738-Speed 3422.34 samples/sec Loss 1.5877 LearningRate 0.0021 Epoch: 22 Global Step: 113350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:37:45,715-Speed 3439.69 samples/sec Loss 1.5753 LearningRate 0.0021 Epoch: 22 Global Step: 113360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:37:48,748-Speed 3377.27 samples/sec Loss 1.4679 LearningRate 0.0021 Epoch: 22 Global Step: 113370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:37:51,808-Speed 3347.78 samples/sec Loss 1.5311 LearningRate 0.0021 Epoch: 22 Global Step: 113380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:37:54,831-Speed 3388.19 samples/sec Loss 1.5399 LearningRate 0.0021 Epoch: 22 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:37:57,869-Speed 3371.72 samples/sec Loss 1.4723 LearningRate 0.0021 Epoch: 22 Global Step: 113400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:38:00,878-Speed 3403.02 samples/sec Loss 1.4590 LearningRate 0.0021 Epoch: 22 Global Step: 113410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:38:03,880-Speed 3412.62 samples/sec Loss 1.4669 LearningRate 0.0021 Epoch: 22 Global Step: 113420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:38:06,858-Speed 3438.88 samples/sec Loss 1.4715 LearningRate 0.0021 Epoch: 22 Global Step: 113430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:38:09,891-Speed 3377.73 samples/sec Loss 1.5297 LearningRate 0.0021 Epoch: 22 Global Step: 113440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:38:12,876-Speed 3430.66 samples/sec Loss 1.5666 LearningRate 0.0021 Epoch: 22 Global Step: 113450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:38:15,905-Speed 3381.80 samples/sec Loss 1.5222 LearningRate 0.0021 Epoch: 22 Global Step: 113460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:38:18,888-Speed 3434.40 samples/sec Loss 1.4906 LearningRate 0.0021 Epoch: 22 Global Step: 113470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:38:21,870-Speed 3434.54 samples/sec Loss 1.5377 LearningRate 0.0021 Epoch: 22 Global Step: 113480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:38:24,856-Speed 3429.86 samples/sec Loss 1.5409 LearningRate 0.0021 Epoch: 22 Global Step: 113490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:38:27,849-Speed 3422.69 samples/sec Loss 1.4313 LearningRate 0.0021 Epoch: 22 Global Step: 113500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:38:30,893-Speed 3364.97 samples/sec Loss 1.5922 LearningRate 0.0021 Epoch: 22 Global Step: 113510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:38:33,923-Speed 3380.31 samples/sec Loss 1.4179 LearningRate 0.0021 Epoch: 22 Global Step: 113520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:38:36,971-Speed 3359.64 samples/sec Loss 1.5392 LearningRate 0.0021 Epoch: 22 Global Step: 113530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:38:40,079-Speed 3296.37 samples/sec Loss 1.4730 LearningRate 0.0021 Epoch: 22 Global Step: 113540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:38:43,073-Speed 3421.09 samples/sec Loss 1.5841 LearningRate 0.0021 Epoch: 22 Global Step: 113550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:38:46,095-Speed 3389.60 samples/sec Loss 1.4508 LearningRate 0.0021 Epoch: 22 Global Step: 113560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:38:49,099-Speed 3409.50 samples/sec Loss 1.4038 LearningRate 0.0021 Epoch: 22 Global Step: 113570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:38:52,094-Speed 3419.62 samples/sec Loss 1.5437 LearningRate 0.0021 Epoch: 22 Global Step: 113580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:38:55,210-Speed 3287.55 samples/sec Loss 1.4780 LearningRate 0.0021 Epoch: 22 Global Step: 113590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:38:58,243-Speed 3376.26 samples/sec Loss 1.6189 LearningRate 0.0021 Epoch: 22 Global Step: 113600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:39:01,246-Speed 3411.61 samples/sec Loss 1.4915 LearningRate 0.0021 Epoch: 22 Global Step: 113610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:39:04,223-Speed 3440.34 samples/sec Loss 1.4610 LearningRate 0.0021 Epoch: 22 Global Step: 113620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:39:07,226-Speed 3410.62 samples/sec Loss 1.4896 LearningRate 0.0021 Epoch: 22 Global Step: 113630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:39:10,203-Speed 3439.97 samples/sec Loss 1.4681 LearningRate 0.0021 Epoch: 22 Global Step: 113640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:39:13,180-Speed 3441.49 samples/sec Loss 1.5642 LearningRate 0.0020 Epoch: 22 Global Step: 113650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:16,162-Speed 3435.18 samples/sec Loss 1.4876 LearningRate 0.0020 Epoch: 22 Global Step: 113660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:19,180-Speed 3393.89 samples/sec Loss 1.5561 LearningRate 0.0020 Epoch: 22 Global Step: 113670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:22,269-Speed 3315.99 samples/sec Loss 1.5533 LearningRate 0.0020 Epoch: 22 Global Step: 113680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:25,254-Speed 3430.59 samples/sec Loss 1.5073 LearningRate 0.0020 Epoch: 22 Global Step: 113690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:28,303-Speed 3363.96 samples/sec Loss 1.4581 LearningRate 0.0020 Epoch: 22 Global Step: 113700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:31,342-Speed 3369.54 samples/sec Loss 1.5184 LearningRate 0.0020 Epoch: 22 Global Step: 113710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:34,392-Speed 3358.88 samples/sec Loss 1.5166 LearningRate 0.0020 Epoch: 22 Global Step: 113720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:37,433-Speed 3368.49 samples/sec Loss 1.4758 LearningRate 0.0020 Epoch: 22 Global Step: 113730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:40,578-Speed 3256.60 samples/sec Loss 1.4565 LearningRate 0.0020 Epoch: 22 Global Step: 113740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:43,668-Speed 3314.44 samples/sec Loss 1.5719 LearningRate 0.0020 Epoch: 22 Global Step: 113750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:39:46,647-Speed 3438.85 samples/sec Loss 1.5344 LearningRate 0.0020 Epoch: 22 Global Step: 113760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:39:49,638-Speed 3423.68 samples/sec Loss 1.5259 LearningRate 0.0020 Epoch: 22 Global Step: 113770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:39:52,607-Speed 3450.93 samples/sec Loss 1.4227 LearningRate 0.0020 Epoch: 22 Global Step: 113780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:55,592-Speed 3431.19 samples/sec Loss 1.4414 LearningRate 0.0020 Epoch: 22 Global Step: 113790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:39:58,660-Speed 3338.06 samples/sec Loss 1.5032 LearningRate 0.0020 Epoch: 22 Global Step: 113800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:40:01,641-Speed 3435.69 samples/sec Loss 1.5069 LearningRate 0.0020 Epoch: 22 Global Step: 113810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:40:04,742-Speed 3303.59 samples/sec Loss 1.4336 LearningRate 0.0020 Epoch: 22 Global Step: 113820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:40:07,805-Speed 3343.75 samples/sec Loss 1.5093 LearningRate 0.0020 Epoch: 22 Global Step: 113830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:40:10,959-Speed 3247.73 samples/sec Loss 1.4267 LearningRate 0.0020 Epoch: 22 Global Step: 113840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:40:14,013-Speed 3354.29 samples/sec Loss 1.5501 LearningRate 0.0020 Epoch: 22 Global Step: 113850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:40:17,101-Speed 3316.56 samples/sec Loss 1.6150 LearningRate 0.0020 Epoch: 22 Global Step: 113860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:40:20,165-Speed 3342.47 samples/sec Loss 1.5278 LearningRate 0.0020 Epoch: 22 Global Step: 113870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:40:23,246-Speed 3324.34 samples/sec Loss 1.4991 LearningRate 0.0020 Epoch: 22 Global Step: 113880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:40:26,246-Speed 3415.10 samples/sec Loss 1.5315 LearningRate 0.0020 Epoch: 22 Global Step: 113890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:40:29,300-Speed 3353.20 samples/sec Loss 1.5071 LearningRate 0.0020 Epoch: 22 Global Step: 113900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:40:32,276-Speed 3442.42 samples/sec Loss 1.5494 LearningRate 0.0020 Epoch: 22 Global Step: 113910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:40:35,241-Speed 3454.64 samples/sec Loss 1.4861 LearningRate 0.0020 Epoch: 22 Global Step: 113920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:40:38,245-Speed 3410.00 samples/sec Loss 1.4913 LearningRate 0.0020 Epoch: 22 Global Step: 113930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:40:41,306-Speed 3346.38 samples/sec Loss 1.5377 LearningRate 0.0020 Epoch: 22 Global Step: 113940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:40:44,286-Speed 3436.34 samples/sec Loss 1.5408 LearningRate 0.0020 Epoch: 22 Global Step: 113950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:40:47,308-Speed 3389.72 samples/sec Loss 1.5345 LearningRate 0.0020 Epoch: 22 Global Step: 113960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:40:50,359-Speed 3356.83 samples/sec Loss 1.5872 LearningRate 0.0019 Epoch: 22 Global Step: 113970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:40:53,361-Speed 3411.94 samples/sec Loss 1.5119 LearningRate 0.0019 Epoch: 22 Global Step: 113980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:40:56,349-Speed 3427.96 samples/sec Loss 1.5766 LearningRate 0.0019 Epoch: 22 Global Step: 113990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:40:59,446-Speed 3307.58 samples/sec Loss 1.4322 LearningRate 0.0019 Epoch: 22 Global Step: 114000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:41:42,849-[lfw][114000]XNorm: 22.777789 Training: 2022-01-20 04:41:42,850-[lfw][114000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-01-20 04:41:42,850-[lfw][114000]Accuracy-Highest: 0.99833 Training: 2022-01-20 04:42:33,206-[cfp_fp][114000]XNorm: 22.296773 Training: 2022-01-20 04:42:33,207-[cfp_fp][114000]Accuracy-Flip: 0.98957+-0.00452 Training: 2022-01-20 04:42:33,207-[cfp_fp][114000]Accuracy-Highest: 0.98957 Training: 2022-01-20 04:43:16,513-[agedb_30][114000]XNorm: 23.003092 Training: 2022-01-20 04:43:16,513-[agedb_30][114000]Accuracy-Flip: 0.98333+-0.00645 Training: 2022-01-20 04:43:16,514-[agedb_30][114000]Accuracy-Highest: 0.98433 Training: 2022-01-20 04:43:19,528-Speed 73.10 samples/sec Loss 1.4630 LearningRate 0.0019 Epoch: 22 Global Step: 114010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:43:22,511-Speed 3434.04 samples/sec Loss 1.4985 LearningRate 0.0019 Epoch: 22 Global Step: 114020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:43:25,532-Speed 3390.20 samples/sec Loss 1.4756 LearningRate 0.0019 Epoch: 22 Global Step: 114030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:43:28,520-Speed 3428.32 samples/sec Loss 1.5460 LearningRate 0.0019 Epoch: 22 Global Step: 114040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:43:31,498-Speed 3439.39 samples/sec Loss 1.4604 LearningRate 0.0019 Epoch: 22 Global Step: 114050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:43:34,479-Speed 3436.86 samples/sec Loss 1.3749 LearningRate 0.0019 Epoch: 22 Global Step: 114060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:43:37,477-Speed 3416.12 samples/sec Loss 1.4310 LearningRate 0.0019 Epoch: 22 Global Step: 114070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:43:40,467-Speed 3425.43 samples/sec Loss 1.4965 LearningRate 0.0019 Epoch: 22 Global Step: 114080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:43:43,442-Speed 3443.14 samples/sec Loss 1.5260 LearningRate 0.0019 Epoch: 22 Global Step: 114090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:43:46,419-Speed 3440.68 samples/sec Loss 1.5264 LearningRate 0.0019 Epoch: 22 Global Step: 114100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:43:49,421-Speed 3412.34 samples/sec Loss 1.5102 LearningRate 0.0019 Epoch: 22 Global Step: 114110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:43:52,404-Speed 3432.74 samples/sec Loss 1.5258 LearningRate 0.0019 Epoch: 22 Global Step: 114120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:43:55,408-Speed 3409.70 samples/sec Loss 1.5554 LearningRate 0.0019 Epoch: 22 Global Step: 114130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:43:58,427-Speed 3393.64 samples/sec Loss 1.5820 LearningRate 0.0019 Epoch: 22 Global Step: 114140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:44:01,515-Speed 3316.91 samples/sec Loss 1.4470 LearningRate 0.0019 Epoch: 22 Global Step: 114150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:44:04,478-Speed 3456.54 samples/sec Loss 1.5054 LearningRate 0.0019 Epoch: 22 Global Step: 114160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:07,630-Speed 3249.62 samples/sec Loss 1.5510 LearningRate 0.0019 Epoch: 22 Global Step: 114170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:10,733-Speed 3300.12 samples/sec Loss 1.4521 LearningRate 0.0019 Epoch: 22 Global Step: 114180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:13,712-Speed 3438.41 samples/sec Loss 1.4989 LearningRate 0.0019 Epoch: 22 Global Step: 114190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:16,737-Speed 3386.29 samples/sec Loss 1.4801 LearningRate 0.0019 Epoch: 22 Global Step: 114200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:19,720-Speed 3432.86 samples/sec Loss 1.4531 LearningRate 0.0019 Epoch: 22 Global Step: 114210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:22,744-Speed 3438.15 samples/sec Loss 1.6416 LearningRate 0.0019 Epoch: 22 Global Step: 114220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:25,882-Speed 3264.07 samples/sec Loss 1.5510 LearningRate 0.0019 Epoch: 22 Global Step: 114230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:28,980-Speed 3366.43 samples/sec Loss 1.5605 LearningRate 0.0019 Epoch: 22 Global Step: 114240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:31,975-Speed 3418.66 samples/sec Loss 1.5660 LearningRate 0.0019 Epoch: 22 Global Step: 114250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:34,978-Speed 3411.00 samples/sec Loss 1.6206 LearningRate 0.0019 Epoch: 22 Global Step: 114260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:44:38,030-Speed 3418.49 samples/sec Loss 1.6191 LearningRate 0.0019 Epoch: 22 Global Step: 114270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:44:41,011-Speed 3436.48 samples/sec Loss 1.4525 LearningRate 0.0019 Epoch: 22 Global Step: 114280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:44:44,069-Speed 3435.85 samples/sec Loss 1.4915 LearningRate 0.0018 Epoch: 22 Global Step: 114290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:44:47,057-Speed 3428.17 samples/sec Loss 1.4881 LearningRate 0.0018 Epoch: 22 Global Step: 114300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:44:50,039-Speed 3434.90 samples/sec Loss 1.5692 LearningRate 0.0018 Epoch: 22 Global Step: 114310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:53,071-Speed 3378.72 samples/sec Loss 1.5309 LearningRate 0.0018 Epoch: 22 Global Step: 114320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:56,110-Speed 3370.73 samples/sec Loss 1.5991 LearningRate 0.0018 Epoch: 22 Global Step: 114330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:44:59,267-Speed 3244.27 samples/sec Loss 1.5250 LearningRate 0.0018 Epoch: 22 Global Step: 114340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:02,317-Speed 3358.12 samples/sec Loss 1.4103 LearningRate 0.0018 Epoch: 22 Global Step: 114350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:05,299-Speed 3434.01 samples/sec Loss 1.5379 LearningRate 0.0018 Epoch: 22 Global Step: 114360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:08,290-Speed 3425.66 samples/sec Loss 1.5346 LearningRate 0.0018 Epoch: 22 Global Step: 114370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:11,269-Speed 3437.67 samples/sec Loss 1.4441 LearningRate 0.0018 Epoch: 22 Global Step: 114380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:14,261-Speed 3423.50 samples/sec Loss 1.5135 LearningRate 0.0018 Epoch: 22 Global Step: 114390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:17,279-Speed 3394.18 samples/sec Loss 1.5989 LearningRate 0.0018 Epoch: 22 Global Step: 114400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:20,260-Speed 3435.18 samples/sec Loss 1.4521 LearningRate 0.0018 Epoch: 22 Global Step: 114410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:45:23,231-Speed 3449.33 samples/sec Loss 1.4896 LearningRate 0.0018 Epoch: 22 Global Step: 114420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:26,243-Speed 3401.09 samples/sec Loss 1.5835 LearningRate 0.0018 Epoch: 22 Global Step: 114430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:29,244-Speed 3412.80 samples/sec Loss 1.4810 LearningRate 0.0018 Epoch: 22 Global Step: 114440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:32,220-Speed 3441.12 samples/sec Loss 1.4746 LearningRate 0.0018 Epoch: 22 Global Step: 114450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:35,214-Speed 3420.97 samples/sec Loss 1.4609 LearningRate 0.0018 Epoch: 22 Global Step: 114460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:38,236-Speed 3390.46 samples/sec Loss 1.5447 LearningRate 0.0018 Epoch: 22 Global Step: 114470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:41,263-Speed 3383.73 samples/sec Loss 1.4762 LearningRate 0.0018 Epoch: 22 Global Step: 114480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:44,319-Speed 3351.63 samples/sec Loss 1.4611 LearningRate 0.0018 Epoch: 22 Global Step: 114490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:47,337-Speed 3394.38 samples/sec Loss 1.5224 LearningRate 0.0018 Epoch: 22 Global Step: 114500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:50,336-Speed 3414.58 samples/sec Loss 1.6830 LearningRate 0.0018 Epoch: 22 Global Step: 114510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:45:53,366-Speed 3381.25 samples/sec Loss 1.5187 LearningRate 0.0018 Epoch: 22 Global Step: 114520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:45:56,349-Speed 3433.41 samples/sec Loss 1.5806 LearningRate 0.0018 Epoch: 22 Global Step: 114530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:45:59,340-Speed 3424.06 samples/sec Loss 1.5616 LearningRate 0.0018 Epoch: 22 Global Step: 114540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:46:02,329-Speed 3426.90 samples/sec Loss 1.5056 LearningRate 0.0018 Epoch: 22 Global Step: 114550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:46:05,350-Speed 3390.15 samples/sec Loss 1.4681 LearningRate 0.0018 Epoch: 22 Global Step: 114560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:46:08,326-Speed 3442.05 samples/sec Loss 1.4984 LearningRate 0.0018 Epoch: 22 Global Step: 114570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:46:11,293-Speed 3451.69 samples/sec Loss 1.4829 LearningRate 0.0018 Epoch: 22 Global Step: 114580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:46:14,283-Speed 3426.94 samples/sec Loss 1.5984 LearningRate 0.0018 Epoch: 22 Global Step: 114590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:46:17,280-Speed 3417.45 samples/sec Loss 1.4599 LearningRate 0.0018 Epoch: 22 Global Step: 114600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:46:20,286-Speed 3406.81 samples/sec Loss 1.6150 LearningRate 0.0018 Epoch: 22 Global Step: 114610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:46:23,270-Speed 3433.13 samples/sec Loss 1.5273 LearningRate 0.0018 Epoch: 22 Global Step: 114620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:46:26,337-Speed 3339.42 samples/sec Loss 1.5354 LearningRate 0.0017 Epoch: 22 Global Step: 114630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:46:29,433-Speed 3308.35 samples/sec Loss 1.6542 LearningRate 0.0017 Epoch: 22 Global Step: 114640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:46:32,466-Speed 3376.96 samples/sec Loss 1.5496 LearningRate 0.0017 Epoch: 22 Global Step: 114650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:46:35,494-Speed 3381.78 samples/sec Loss 1.4782 LearningRate 0.0017 Epoch: 22 Global Step: 114660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:46:38,517-Speed 3388.85 samples/sec Loss 1.5212 LearningRate 0.0017 Epoch: 22 Global Step: 114670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:46:41,560-Speed 3366.63 samples/sec Loss 1.5218 LearningRate 0.0017 Epoch: 22 Global Step: 114680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:46:44,592-Speed 3377.46 samples/sec Loss 1.5345 LearningRate 0.0017 Epoch: 22 Global Step: 114690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:46:47,589-Speed 3417.70 samples/sec Loss 1.4764 LearningRate 0.0017 Epoch: 22 Global Step: 114700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:46:50,564-Speed 3443.30 samples/sec Loss 1.4876 LearningRate 0.0017 Epoch: 22 Global Step: 114710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:46:53,543-Speed 3438.36 samples/sec Loss 1.4098 LearningRate 0.0017 Epoch: 22 Global Step: 114720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:46:56,523-Speed 3437.06 samples/sec Loss 1.4698 LearningRate 0.0017 Epoch: 22 Global Step: 114730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:46:59,516-Speed 3421.92 samples/sec Loss 1.4079 LearningRate 0.0017 Epoch: 22 Global Step: 114740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:47:02,501-Speed 3430.92 samples/sec Loss 1.4590 LearningRate 0.0017 Epoch: 22 Global Step: 114750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:47:05,486-Speed 3432.03 samples/sec Loss 1.5147 LearningRate 0.0017 Epoch: 22 Global Step: 114760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:47:08,448-Speed 3458.49 samples/sec Loss 1.5421 LearningRate 0.0017 Epoch: 22 Global Step: 114770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:47:11,561-Speed 3289.97 samples/sec Loss 1.4750 LearningRate 0.0017 Epoch: 22 Global Step: 114780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:47:14,569-Speed 3405.41 samples/sec Loss 1.5137 LearningRate 0.0017 Epoch: 22 Global Step: 114790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:47:17,578-Speed 3403.34 samples/sec Loss 1.4670 LearningRate 0.0017 Epoch: 22 Global Step: 114800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:47:20,658-Speed 3325.23 samples/sec Loss 1.5835 LearningRate 0.0017 Epoch: 22 Global Step: 114810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:47:23,658-Speed 3415.42 samples/sec Loss 1.4318 LearningRate 0.0017 Epoch: 22 Global Step: 114820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:47:26,672-Speed 3397.50 samples/sec Loss 1.5054 LearningRate 0.0017 Epoch: 22 Global Step: 114830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:47:29,657-Speed 3431.64 samples/sec Loss 1.4602 LearningRate 0.0017 Epoch: 22 Global Step: 114840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:47:32,640-Speed 3433.83 samples/sec Loss 1.4706 LearningRate 0.0017 Epoch: 22 Global Step: 114850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:47:35,623-Speed 3433.79 samples/sec Loss 1.5714 LearningRate 0.0017 Epoch: 22 Global Step: 114860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:47:38,645-Speed 3389.43 samples/sec Loss 1.4971 LearningRate 0.0017 Epoch: 22 Global Step: 114870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:47:41,662-Speed 3394.57 samples/sec Loss 1.5639 LearningRate 0.0017 Epoch: 22 Global Step: 114880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:47:44,638-Speed 3442.25 samples/sec Loss 1.5561 LearningRate 0.0017 Epoch: 22 Global Step: 114890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:47:47,621-Speed 3434.07 samples/sec Loss 1.4751 LearningRate 0.0017 Epoch: 22 Global Step: 114900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:47:50,595-Speed 3443.19 samples/sec Loss 1.3917 LearningRate 0.0017 Epoch: 22 Global Step: 114910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:47:53,574-Speed 3438.32 samples/sec Loss 1.4704 LearningRate 0.0017 Epoch: 22 Global Step: 114920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:47:56,548-Speed 3444.15 samples/sec Loss 1.5539 LearningRate 0.0017 Epoch: 22 Global Step: 114930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:47:59,605-Speed 3351.10 samples/sec Loss 1.5068 LearningRate 0.0017 Epoch: 22 Global Step: 114940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:48:02,729-Speed 3277.75 samples/sec Loss 1.5026 LearningRate 0.0017 Epoch: 22 Global Step: 114950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:48:05,705-Speed 3442.98 samples/sec Loss 1.5329 LearningRate 0.0017 Epoch: 22 Global Step: 114960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:48:08,682-Speed 3441.98 samples/sec Loss 1.5259 LearningRate 0.0016 Epoch: 22 Global Step: 114970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:48:11,756-Speed 3331.99 samples/sec Loss 1.4286 LearningRate 0.0016 Epoch: 22 Global Step: 114980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:48:14,783-Speed 3383.63 samples/sec Loss 1.5154 LearningRate 0.0016 Epoch: 22 Global Step: 114990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:48:17,746-Speed 3456.47 samples/sec Loss 1.5369 LearningRate 0.0016 Epoch: 22 Global Step: 115000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:48:20,739-Speed 3422.64 samples/sec Loss 1.5433 LearningRate 0.0016 Epoch: 22 Global Step: 115010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:48:23,720-Speed 3435.87 samples/sec Loss 1.6110 LearningRate 0.0016 Epoch: 22 Global Step: 115020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:48:26,750-Speed 3380.25 samples/sec Loss 1.5546 LearningRate 0.0016 Epoch: 22 Global Step: 115030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:48:29,741-Speed 3425.18 samples/sec Loss 1.4991 LearningRate 0.0016 Epoch: 22 Global Step: 115040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:48:32,722-Speed 3435.38 samples/sec Loss 1.5580 LearningRate 0.0016 Epoch: 22 Global Step: 115050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:48:35,698-Speed 3441.85 samples/sec Loss 1.5574 LearningRate 0.0016 Epoch: 22 Global Step: 115060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:48:38,682-Speed 3432.86 samples/sec Loss 1.5306 LearningRate 0.0016 Epoch: 22 Global Step: 115070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:48:41,665-Speed 3433.39 samples/sec Loss 1.5198 LearningRate 0.0016 Epoch: 22 Global Step: 115080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:48:44,679-Speed 3398.03 samples/sec Loss 1.4598 LearningRate 0.0016 Epoch: 22 Global Step: 115090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:48:47,787-Speed 3296.13 samples/sec Loss 1.5226 LearningRate 0.0016 Epoch: 22 Global Step: 115100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:48:50,803-Speed 3395.81 samples/sec Loss 1.4355 LearningRate 0.0016 Epoch: 22 Global Step: 115110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:48:53,829-Speed 3384.47 samples/sec Loss 1.6070 LearningRate 0.0016 Epoch: 22 Global Step: 115120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:48:56,833-Speed 3409.86 samples/sec Loss 1.6190 LearningRate 0.0016 Epoch: 22 Global Step: 115130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:48:59,879-Speed 3364.13 samples/sec Loss 1.5330 LearningRate 0.0016 Epoch: 22 Global Step: 115140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:02,884-Speed 3408.36 samples/sec Loss 1.6310 LearningRate 0.0016 Epoch: 22 Global Step: 115150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:05,863-Speed 3438.31 samples/sec Loss 1.5105 LearningRate 0.0016 Epoch: 22 Global Step: 115160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:08,860-Speed 3417.28 samples/sec Loss 1.4595 LearningRate 0.0016 Epoch: 22 Global Step: 115170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:11,938-Speed 3327.38 samples/sec Loss 1.4184 LearningRate 0.0016 Epoch: 22 Global Step: 115180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:14,925-Speed 3429.44 samples/sec Loss 1.4805 LearningRate 0.0016 Epoch: 22 Global Step: 115190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:17,969-Speed 3365.17 samples/sec Loss 1.5582 LearningRate 0.0016 Epoch: 22 Global Step: 115200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:49:20,973-Speed 3409.15 samples/sec Loss 1.5288 LearningRate 0.0016 Epoch: 22 Global Step: 115210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:49:24,016-Speed 3365.93 samples/sec Loss 1.5931 LearningRate 0.0016 Epoch: 22 Global Step: 115220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:26,996-Speed 3437.65 samples/sec Loss 1.5798 LearningRate 0.0016 Epoch: 22 Global Step: 115230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:29,983-Speed 3429.63 samples/sec Loss 1.4718 LearningRate 0.0016 Epoch: 22 Global Step: 115240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:32,962-Speed 3438.89 samples/sec Loss 1.4872 LearningRate 0.0016 Epoch: 22 Global Step: 115250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:35,967-Speed 3407.71 samples/sec Loss 1.5462 LearningRate 0.0016 Epoch: 22 Global Step: 115260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:38,944-Speed 3441.27 samples/sec Loss 1.5141 LearningRate 0.0016 Epoch: 22 Global Step: 115270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:42,006-Speed 3344.98 samples/sec Loss 1.4265 LearningRate 0.0016 Epoch: 22 Global Step: 115280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:45,004-Speed 3415.79 samples/sec Loss 1.5466 LearningRate 0.0016 Epoch: 22 Global Step: 115290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:47,986-Speed 3435.17 samples/sec Loss 1.5027 LearningRate 0.0016 Epoch: 22 Global Step: 115300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:50,972-Speed 3429.94 samples/sec Loss 1.5132 LearningRate 0.0016 Epoch: 22 Global Step: 115310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:49:53,949-Speed 3440.27 samples/sec Loss 1.4452 LearningRate 0.0015 Epoch: 22 Global Step: 115320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:49:56,941-Speed 3424.23 samples/sec Loss 1.5507 LearningRate 0.0015 Epoch: 22 Global Step: 115330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:49:59,920-Speed 3439.06 samples/sec Loss 1.4552 LearningRate 0.0015 Epoch: 22 Global Step: 115340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:50:02,912-Speed 3422.29 samples/sec Loss 1.5585 LearningRate 0.0015 Epoch: 22 Global Step: 115350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:50:05,897-Speed 3432.23 samples/sec Loss 1.5071 LearningRate 0.0015 Epoch: 22 Global Step: 115360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:50:08,861-Speed 3456.12 samples/sec Loss 1.4655 LearningRate 0.0015 Epoch: 22 Global Step: 115370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:11,839-Speed 3439.58 samples/sec Loss 1.5377 LearningRate 0.0015 Epoch: 22 Global Step: 115380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:14,827-Speed 3428.58 samples/sec Loss 1.5398 LearningRate 0.0015 Epoch: 22 Global Step: 115390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:17,830-Speed 3410.79 samples/sec Loss 1.4796 LearningRate 0.0015 Epoch: 22 Global Step: 115400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:20,820-Speed 3426.29 samples/sec Loss 1.6039 LearningRate 0.0015 Epoch: 22 Global Step: 115410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:23,801-Speed 3436.21 samples/sec Loss 1.5234 LearningRate 0.0015 Epoch: 22 Global Step: 115420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:26,795-Speed 3420.80 samples/sec Loss 1.4805 LearningRate 0.0015 Epoch: 22 Global Step: 115430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:29,778-Speed 3434.45 samples/sec Loss 1.5526 LearningRate 0.0015 Epoch: 22 Global Step: 115440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:32,763-Speed 3431.09 samples/sec Loss 1.4178 LearningRate 0.0015 Epoch: 22 Global Step: 115450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:35,748-Speed 3431.50 samples/sec Loss 1.5045 LearningRate 0.0015 Epoch: 22 Global Step: 115460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:38,743-Speed 3419.78 samples/sec Loss 1.4935 LearningRate 0.0015 Epoch: 22 Global Step: 115470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:50:41,725-Speed 3435.22 samples/sec Loss 1.5826 LearningRate 0.0015 Epoch: 22 Global Step: 115480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:44,713-Speed 3428.07 samples/sec Loss 1.5046 LearningRate 0.0015 Epoch: 22 Global Step: 115490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:47,744-Speed 3378.88 samples/sec Loss 1.5258 LearningRate 0.0015 Epoch: 22 Global Step: 115500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:50,735-Speed 3424.79 samples/sec Loss 1.4672 LearningRate 0.0015 Epoch: 22 Global Step: 115510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:53,720-Speed 3431.60 samples/sec Loss 1.5325 LearningRate 0.0015 Epoch: 22 Global Step: 115520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:56,699-Speed 3438.35 samples/sec Loss 1.5441 LearningRate 0.0015 Epoch: 22 Global Step: 115530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:50:59,725-Speed 3384.67 samples/sec Loss 1.5116 LearningRate 0.0015 Epoch: 22 Global Step: 115540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:51:02,762-Speed 3373.33 samples/sec Loss 1.4935 LearningRate 0.0015 Epoch: 22 Global Step: 115550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:51:05,740-Speed 3438.52 samples/sec Loss 1.5349 LearningRate 0.0015 Epoch: 22 Global Step: 115560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:51:08,719-Speed 3438.48 samples/sec Loss 1.5307 LearningRate 0.0015 Epoch: 22 Global Step: 115570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:51:11,743-Speed 3387.23 samples/sec Loss 1.4937 LearningRate 0.0015 Epoch: 22 Global Step: 115580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:51:14,726-Speed 3434.12 samples/sec Loss 1.4550 LearningRate 0.0015 Epoch: 22 Global Step: 115590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:51:17,717-Speed 3423.72 samples/sec Loss 1.4670 LearningRate 0.0015 Epoch: 22 Global Step: 115600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:51:20,873-Speed 3246.13 samples/sec Loss 1.5110 LearningRate 0.0015 Epoch: 22 Global Step: 115610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:51:23,961-Speed 3316.77 samples/sec Loss 1.5048 LearningRate 0.0015 Epoch: 22 Global Step: 115620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:51:27,055-Speed 3310.47 samples/sec Loss 1.4882 LearningRate 0.0015 Epoch: 22 Global Step: 115630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:51:30,068-Speed 3399.53 samples/sec Loss 1.4497 LearningRate 0.0015 Epoch: 22 Global Step: 115640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:51:33,122-Speed 3354.48 samples/sec Loss 1.4619 LearningRate 0.0015 Epoch: 22 Global Step: 115650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:51:36,129-Speed 3405.87 samples/sec Loss 1.5032 LearningRate 0.0015 Epoch: 22 Global Step: 115660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:51:39,174-Speed 3363.22 samples/sec Loss 1.5271 LearningRate 0.0015 Epoch: 22 Global Step: 115670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:51:42,182-Speed 3404.88 samples/sec Loss 1.5090 LearningRate 0.0015 Epoch: 22 Global Step: 115680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:51:45,239-Speed 3351.54 samples/sec Loss 1.5254 LearningRate 0.0014 Epoch: 22 Global Step: 115690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:51:48,238-Speed 3415.42 samples/sec Loss 1.5589 LearningRate 0.0014 Epoch: 22 Global Step: 115700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:51:51,277-Speed 3369.72 samples/sec Loss 1.4657 LearningRate 0.0014 Epoch: 22 Global Step: 115710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:51:54,262-Speed 3431.50 samples/sec Loss 1.4726 LearningRate 0.0014 Epoch: 22 Global Step: 115720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:51:57,246-Speed 3432.20 samples/sec Loss 1.5001 LearningRate 0.0014 Epoch: 22 Global Step: 115730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:00,236-Speed 3426.86 samples/sec Loss 1.4829 LearningRate 0.0014 Epoch: 22 Global Step: 115740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:03,219-Speed 3432.60 samples/sec Loss 1.5260 LearningRate 0.0014 Epoch: 22 Global Step: 115750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:06,258-Speed 3370.06 samples/sec Loss 1.4366 LearningRate 0.0014 Epoch: 22 Global Step: 115760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:09,245-Speed 3430.28 samples/sec Loss 1.5321 LearningRate 0.0014 Epoch: 22 Global Step: 115770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:12,240-Speed 3419.29 samples/sec Loss 1.5425 LearningRate 0.0014 Epoch: 22 Global Step: 115780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:15,219-Speed 3438.75 samples/sec Loss 1.5280 LearningRate 0.0014 Epoch: 22 Global Step: 115790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:18,208-Speed 3427.18 samples/sec Loss 1.5869 LearningRate 0.0014 Epoch: 22 Global Step: 115800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:21,205-Speed 3416.83 samples/sec Loss 1.5448 LearningRate 0.0014 Epoch: 22 Global Step: 115810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:24,189-Speed 3433.48 samples/sec Loss 1.4620 LearningRate 0.0014 Epoch: 22 Global Step: 115820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:52:27,175-Speed 3429.65 samples/sec Loss 1.5776 LearningRate 0.0014 Epoch: 22 Global Step: 115830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:52:30,175-Speed 3415.59 samples/sec Loss 1.5253 LearningRate 0.0014 Epoch: 22 Global Step: 115840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:52:33,155-Speed 3436.34 samples/sec Loss 1.5564 LearningRate 0.0014 Epoch: 22 Global Step: 115850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:52:36,120-Speed 3454.72 samples/sec Loss 1.5083 LearningRate 0.0014 Epoch: 22 Global Step: 115860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:39,115-Speed 3419.96 samples/sec Loss 1.5848 LearningRate 0.0014 Epoch: 22 Global Step: 115870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:42,097-Speed 3434.32 samples/sec Loss 1.6089 LearningRate 0.0014 Epoch: 22 Global Step: 115880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:45,080-Speed 3434.03 samples/sec Loss 1.4810 LearningRate 0.0014 Epoch: 22 Global Step: 115890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:48,082-Speed 3412.09 samples/sec Loss 1.5137 LearningRate 0.0014 Epoch: 22 Global Step: 115900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:51,083-Speed 3413.26 samples/sec Loss 1.5075 LearningRate 0.0014 Epoch: 22 Global Step: 115910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:54,074-Speed 3424.60 samples/sec Loss 1.4739 LearningRate 0.0014 Epoch: 22 Global Step: 115920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:52:57,062-Speed 3427.15 samples/sec Loss 1.5178 LearningRate 0.0014 Epoch: 22 Global Step: 115930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:53:00,054-Speed 3423.99 samples/sec Loss 1.5360 LearningRate 0.0014 Epoch: 22 Global Step: 115940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:53:03,066-Speed 3400.41 samples/sec Loss 1.5514 LearningRate 0.0014 Epoch: 22 Global Step: 115950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:53:06,052-Speed 3430.30 samples/sec Loss 1.4309 LearningRate 0.0014 Epoch: 22 Global Step: 115960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:53:09,037-Speed 3431.47 samples/sec Loss 1.5239 LearningRate 0.0014 Epoch: 22 Global Step: 115970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:53:12,018-Speed 3435.77 samples/sec Loss 1.5365 LearningRate 0.0014 Epoch: 22 Global Step: 115980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:53:15,001-Speed 3434.03 samples/sec Loss 1.4808 LearningRate 0.0014 Epoch: 22 Global Step: 115990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:53:18,045-Speed 3365.07 samples/sec Loss 1.4937 LearningRate 0.0014 Epoch: 22 Global Step: 116000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:54:01,195-[lfw][116000]XNorm: 22.518928 Training: 2022-01-20 04:54:01,196-[lfw][116000]Accuracy-Flip: 0.99817+-0.00263 Training: 2022-01-20 04:54:01,196-[lfw][116000]Accuracy-Highest: 0.99833 Training: 2022-01-20 04:54:51,054-[cfp_fp][116000]XNorm: 22.020415 Training: 2022-01-20 04:54:51,055-[cfp_fp][116000]Accuracy-Flip: 0.98886+-0.00514 Training: 2022-01-20 04:54:51,055-[cfp_fp][116000]Accuracy-Highest: 0.98957 Training: 2022-01-20 04:55:33,971-[agedb_30][116000]XNorm: 22.753317 Training: 2022-01-20 04:55:33,971-[agedb_30][116000]Accuracy-Flip: 0.98417+-0.00593 Training: 2022-01-20 04:55:33,972-[agedb_30][116000]Accuracy-Highest: 0.98433 Training: 2022-01-20 04:55:36,950-Speed 73.72 samples/sec Loss 1.5291 LearningRate 0.0014 Epoch: 22 Global Step: 116010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:55:39,950-Speed 3414.47 samples/sec Loss 1.4842 LearningRate 0.0014 Epoch: 22 Global Step: 116020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:55:42,931-Speed 3436.19 samples/sec Loss 1.5756 LearningRate 0.0014 Epoch: 22 Global Step: 116030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:55:45,899-Speed 3451.70 samples/sec Loss 1.4288 LearningRate 0.0014 Epoch: 22 Global Step: 116040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:55:48,876-Speed 3440.05 samples/sec Loss 1.4851 LearningRate 0.0014 Epoch: 22 Global Step: 116050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:55:51,854-Speed 3439.98 samples/sec Loss 1.5734 LearningRate 0.0014 Epoch: 22 Global Step: 116060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:55:54,839-Speed 3430.25 samples/sec Loss 1.5488 LearningRate 0.0013 Epoch: 22 Global Step: 116070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:55:57,876-Speed 3373.05 samples/sec Loss 1.5979 LearningRate 0.0013 Epoch: 22 Global Step: 116080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:56:00,856-Speed 3437.75 samples/sec Loss 1.4678 LearningRate 0.0013 Epoch: 22 Global Step: 116090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:56:03,839-Speed 3432.90 samples/sec Loss 1.5235 LearningRate 0.0013 Epoch: 22 Global Step: 116100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:56:06,830-Speed 3425.46 samples/sec Loss 1.4531 LearningRate 0.0013 Epoch: 22 Global Step: 116110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:56:09,823-Speed 3421.98 samples/sec Loss 1.4244 LearningRate 0.0013 Epoch: 22 Global Step: 116120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:56:12,813-Speed 3425.45 samples/sec Loss 1.4701 LearningRate 0.0013 Epoch: 22 Global Step: 116130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:56:15,792-Speed 3439.25 samples/sec Loss 1.5265 LearningRate 0.0013 Epoch: 22 Global Step: 116140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:56:18,770-Speed 3438.71 samples/sec Loss 1.4390 LearningRate 0.0013 Epoch: 22 Global Step: 116150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:56:21,761-Speed 3425.26 samples/sec Loss 1.5042 LearningRate 0.0013 Epoch: 22 Global Step: 116160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:56:24,872-Speed 3291.49 samples/sec Loss 1.4423 LearningRate 0.0013 Epoch: 22 Global Step: 116170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:56:27,955-Speed 3321.77 samples/sec Loss 1.5541 LearningRate 0.0013 Epoch: 22 Global Step: 116180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:56:30,981-Speed 3385.74 samples/sec Loss 1.6296 LearningRate 0.0013 Epoch: 22 Global Step: 116190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:56:33,941-Speed 3460.20 samples/sec Loss 1.4810 LearningRate 0.0013 Epoch: 22 Global Step: 116200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:56:36,923-Speed 3435.37 samples/sec Loss 1.5004 LearningRate 0.0013 Epoch: 22 Global Step: 116210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:56:39,975-Speed 3355.70 samples/sec Loss 1.5096 LearningRate 0.0013 Epoch: 22 Global Step: 116220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:56:42,957-Speed 3434.55 samples/sec Loss 1.6014 LearningRate 0.0013 Epoch: 22 Global Step: 116230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:56:45,943-Speed 3431.29 samples/sec Loss 1.4869 LearningRate 0.0013 Epoch: 22 Global Step: 116240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:56:48,916-Speed 3444.42 samples/sec Loss 1.5379 LearningRate 0.0013 Epoch: 22 Global Step: 116250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:56:51,896-Speed 3437.61 samples/sec Loss 1.3660 LearningRate 0.0013 Epoch: 22 Global Step: 116260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:56:54,889-Speed 3422.18 samples/sec Loss 1.5209 LearningRate 0.0013 Epoch: 22 Global Step: 116270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:56:57,889-Speed 3413.24 samples/sec Loss 1.4532 LearningRate 0.0013 Epoch: 22 Global Step: 116280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:57:00,986-Speed 3307.62 samples/sec Loss 1.4744 LearningRate 0.0013 Epoch: 22 Global Step: 116290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 04:57:04,159-Speed 3227.89 samples/sec Loss 1.5186 LearningRate 0.0013 Epoch: 22 Global Step: 116300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:57:07,285-Speed 3276.62 samples/sec Loss 1.4904 LearningRate 0.0013 Epoch: 22 Global Step: 116310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:57:10,311-Speed 3385.80 samples/sec Loss 1.4709 LearningRate 0.0013 Epoch: 22 Global Step: 116320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:57:13,358-Speed 3361.53 samples/sec Loss 1.4979 LearningRate 0.0013 Epoch: 22 Global Step: 116330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:57:26,572-Speed 774.95 samples/sec Loss 1.2316 LearningRate 0.0013 Epoch: 23 Global Step: 116340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:57:29,664-Speed 3313.65 samples/sec Loss 1.1243 LearningRate 0.0013 Epoch: 23 Global Step: 116350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:57:32,808-Speed 3257.81 samples/sec Loss 1.1516 LearningRate 0.0013 Epoch: 23 Global Step: 116360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:57:35,782-Speed 3445.15 samples/sec Loss 1.2316 LearningRate 0.0013 Epoch: 23 Global Step: 116370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:57:38,776-Speed 3420.91 samples/sec Loss 1.1598 LearningRate 0.0013 Epoch: 23 Global Step: 116380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:57:41,855-Speed 3327.14 samples/sec Loss 1.0765 LearningRate 0.0013 Epoch: 23 Global Step: 116390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:57:44,863-Speed 3404.23 samples/sec Loss 1.1672 LearningRate 0.0013 Epoch: 23 Global Step: 116400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:57:47,836-Speed 3446.22 samples/sec Loss 1.1555 LearningRate 0.0013 Epoch: 23 Global Step: 116410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:57:50,841-Speed 3408.36 samples/sec Loss 1.1952 LearningRate 0.0013 Epoch: 23 Global Step: 116420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:57:53,858-Speed 3395.30 samples/sec Loss 1.1728 LearningRate 0.0013 Epoch: 23 Global Step: 116430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:57:56,843-Speed 3431.27 samples/sec Loss 1.1696 LearningRate 0.0013 Epoch: 23 Global Step: 116440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:57:59,816-Speed 3445.06 samples/sec Loss 1.1256 LearningRate 0.0013 Epoch: 23 Global Step: 116450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:02,796-Speed 3437.35 samples/sec Loss 1.2625 LearningRate 0.0012 Epoch: 23 Global Step: 116460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:05,836-Speed 3368.96 samples/sec Loss 1.1263 LearningRate 0.0012 Epoch: 23 Global Step: 116470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:08,850-Speed 3398.54 samples/sec Loss 1.1640 LearningRate 0.0012 Epoch: 23 Global Step: 116480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:11,843-Speed 3423.17 samples/sec Loss 1.1050 LearningRate 0.0012 Epoch: 23 Global Step: 116490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:14,846-Speed 3410.51 samples/sec Loss 1.1795 LearningRate 0.0012 Epoch: 23 Global Step: 116500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:17,817-Speed 3447.15 samples/sec Loss 1.1605 LearningRate 0.0012 Epoch: 23 Global Step: 116510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:20,797-Speed 3437.90 samples/sec Loss 1.1367 LearningRate 0.0012 Epoch: 23 Global Step: 116520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:23,815-Speed 3393.11 samples/sec Loss 1.2705 LearningRate 0.0012 Epoch: 23 Global Step: 116530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:26,828-Speed 3399.88 samples/sec Loss 1.1159 LearningRate 0.0012 Epoch: 23 Global Step: 116540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:58:29,869-Speed 3368.68 samples/sec Loss 1.1383 LearningRate 0.0012 Epoch: 23 Global Step: 116550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:58:32,885-Speed 3395.68 samples/sec Loss 1.1605 LearningRate 0.0012 Epoch: 23 Global Step: 116560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:58:35,873-Speed 3428.24 samples/sec Loss 1.1331 LearningRate 0.0012 Epoch: 23 Global Step: 116570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:38,856-Speed 3434.00 samples/sec Loss 1.1501 LearningRate 0.0012 Epoch: 23 Global Step: 116580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:41,899-Speed 3365.86 samples/sec Loss 1.1022 LearningRate 0.0012 Epoch: 23 Global Step: 116590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:44,877-Speed 3439.14 samples/sec Loss 1.1468 LearningRate 0.0012 Epoch: 23 Global Step: 116600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:47,873-Speed 3418.57 samples/sec Loss 1.0928 LearningRate 0.0012 Epoch: 23 Global Step: 116610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:50,891-Speed 3394.40 samples/sec Loss 1.1871 LearningRate 0.0012 Epoch: 23 Global Step: 116620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:53,880-Speed 3426.50 samples/sec Loss 1.1475 LearningRate 0.0012 Epoch: 23 Global Step: 116630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:56,866-Speed 3430.65 samples/sec Loss 1.1774 LearningRate 0.0012 Epoch: 23 Global Step: 116640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:58:59,873-Speed 3406.23 samples/sec Loss 1.1004 LearningRate 0.0012 Epoch: 23 Global Step: 116650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:02,982-Speed 3294.40 samples/sec Loss 1.0857 LearningRate 0.0012 Epoch: 23 Global Step: 116660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:06,000-Speed 3394.16 samples/sec Loss 1.1116 LearningRate 0.0012 Epoch: 23 Global Step: 116670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:59:08,998-Speed 3416.75 samples/sec Loss 1.1730 LearningRate 0.0012 Epoch: 23 Global Step: 116680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:59:12,001-Speed 3410.13 samples/sec Loss 1.1237 LearningRate 0.0012 Epoch: 23 Global Step: 116690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:59:14,984-Speed 3433.78 samples/sec Loss 1.1111 LearningRate 0.0012 Epoch: 23 Global Step: 116700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:59:17,956-Speed 3446.56 samples/sec Loss 1.1868 LearningRate 0.0012 Epoch: 23 Global Step: 116710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:20,955-Speed 3415.08 samples/sec Loss 1.2427 LearningRate 0.0012 Epoch: 23 Global Step: 116720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:23,996-Speed 3368.48 samples/sec Loss 1.2044 LearningRate 0.0012 Epoch: 23 Global Step: 116730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:27,043-Speed 3362.52 samples/sec Loss 1.0459 LearningRate 0.0012 Epoch: 23 Global Step: 116740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:30,037-Speed 3420.94 samples/sec Loss 1.1654 LearningRate 0.0012 Epoch: 23 Global Step: 116750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:33,076-Speed 3370.16 samples/sec Loss 1.1851 LearningRate 0.0012 Epoch: 23 Global Step: 116760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:36,156-Speed 3325.97 samples/sec Loss 1.1416 LearningRate 0.0012 Epoch: 23 Global Step: 116770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:39,141-Speed 3430.81 samples/sec Loss 1.1515 LearningRate 0.0012 Epoch: 23 Global Step: 116780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:42,125-Speed 3433.27 samples/sec Loss 1.2022 LearningRate 0.0012 Epoch: 23 Global Step: 116790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:45,142-Speed 3394.75 samples/sec Loss 1.0939 LearningRate 0.0012 Epoch: 23 Global Step: 116800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 04:59:48,154-Speed 3400.61 samples/sec Loss 1.1689 LearningRate 0.0012 Epoch: 23 Global Step: 116810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:59:51,172-Speed 3394.31 samples/sec Loss 1.1155 LearningRate 0.0012 Epoch: 23 Global Step: 116820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:59:54,153-Speed 3436.22 samples/sec Loss 1.1181 LearningRate 0.0012 Epoch: 23 Global Step: 116830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 04:59:57,132-Speed 3438.80 samples/sec Loss 1.1613 LearningRate 0.0012 Epoch: 23 Global Step: 116840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:00,186-Speed 3352.77 samples/sec Loss 1.1837 LearningRate 0.0012 Epoch: 23 Global Step: 116850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:03,208-Speed 3389.23 samples/sec Loss 1.1899 LearningRate 0.0012 Epoch: 23 Global Step: 116860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:06,210-Speed 3413.29 samples/sec Loss 1.2068 LearningRate 0.0011 Epoch: 23 Global Step: 116870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:09,199-Speed 3425.62 samples/sec Loss 1.0625 LearningRate 0.0011 Epoch: 23 Global Step: 116880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:12,194-Speed 3420.26 samples/sec Loss 1.1811 LearningRate 0.0011 Epoch: 23 Global Step: 116890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:15,198-Speed 3409.79 samples/sec Loss 1.1775 LearningRate 0.0011 Epoch: 23 Global Step: 116900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:18,230-Speed 3378.45 samples/sec Loss 1.1210 LearningRate 0.0011 Epoch: 23 Global Step: 116910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:21,231-Speed 3412.91 samples/sec Loss 1.1432 LearningRate 0.0011 Epoch: 23 Global Step: 116920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:24,219-Speed 3428.84 samples/sec Loss 1.1082 LearningRate 0.0011 Epoch: 23 Global Step: 116930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:27,204-Speed 3431.09 samples/sec Loss 1.1775 LearningRate 0.0011 Epoch: 23 Global Step: 116940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:30,208-Speed 3409.90 samples/sec Loss 1.1329 LearningRate 0.0011 Epoch: 23 Global Step: 116950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:33,212-Speed 3409.14 samples/sec Loss 1.1211 LearningRate 0.0011 Epoch: 23 Global Step: 116960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:36,193-Speed 3437.08 samples/sec Loss 1.1545 LearningRate 0.0011 Epoch: 23 Global Step: 116970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:39,204-Speed 3401.47 samples/sec Loss 1.2576 LearningRate 0.0011 Epoch: 23 Global Step: 116980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:00:42,177-Speed 3445.32 samples/sec Loss 1.2228 LearningRate 0.0011 Epoch: 23 Global Step: 116990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:00:45,156-Speed 3438.94 samples/sec Loss 1.1637 LearningRate 0.0011 Epoch: 23 Global Step: 117000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:00:48,146-Speed 3424.79 samples/sec Loss 1.1332 LearningRate 0.0011 Epoch: 23 Global Step: 117010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:00:51,130-Speed 3433.32 samples/sec Loss 1.1819 LearningRate 0.0011 Epoch: 23 Global Step: 117020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:00:54,117-Speed 3429.47 samples/sec Loss 1.1588 LearningRate 0.0011 Epoch: 23 Global Step: 117030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:00:57,108-Speed 3423.95 samples/sec Loss 1.1465 LearningRate 0.0011 Epoch: 23 Global Step: 117040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:01:00,107-Speed 3414.89 samples/sec Loss 1.1981 LearningRate 0.0011 Epoch: 23 Global Step: 117050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:01:03,192-Speed 3320.29 samples/sec Loss 1.1686 LearningRate 0.0011 Epoch: 23 Global Step: 117060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:01:06,223-Speed 3379.30 samples/sec Loss 1.1152 LearningRate 0.0011 Epoch: 23 Global Step: 117070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:01:09,240-Speed 3394.81 samples/sec Loss 1.1744 LearningRate 0.0011 Epoch: 23 Global Step: 117080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:01:12,236-Speed 3419.73 samples/sec Loss 1.2370 LearningRate 0.0011 Epoch: 23 Global Step: 117090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:15,328-Speed 3312.18 samples/sec Loss 1.1217 LearningRate 0.0011 Epoch: 23 Global Step: 117100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:18,336-Speed 3405.80 samples/sec Loss 1.2523 LearningRate 0.0011 Epoch: 23 Global Step: 117110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:21,354-Speed 3394.17 samples/sec Loss 1.0962 LearningRate 0.0011 Epoch: 23 Global Step: 117120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:24,383-Speed 3380.95 samples/sec Loss 1.0655 LearningRate 0.0011 Epoch: 23 Global Step: 117130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:27,376-Speed 3422.67 samples/sec Loss 1.0981 LearningRate 0.0011 Epoch: 23 Global Step: 117140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:30,427-Speed 3357.69 samples/sec Loss 1.2038 LearningRate 0.0011 Epoch: 23 Global Step: 117150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:33,510-Speed 3321.62 samples/sec Loss 1.1580 LearningRate 0.0011 Epoch: 23 Global Step: 117160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:36,500-Speed 3425.59 samples/sec Loss 1.1441 LearningRate 0.0011 Epoch: 23 Global Step: 117170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:39,534-Speed 3375.64 samples/sec Loss 1.1746 LearningRate 0.0011 Epoch: 23 Global Step: 117180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:42,510-Speed 3442.88 samples/sec Loss 1.2151 LearningRate 0.0011 Epoch: 23 Global Step: 117190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:45,565-Speed 3353.13 samples/sec Loss 1.1605 LearningRate 0.0011 Epoch: 23 Global Step: 117200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:01:48,611-Speed 3362.26 samples/sec Loss 1.1347 LearningRate 0.0011 Epoch: 23 Global Step: 117210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:01:51,684-Speed 3333.09 samples/sec Loss 1.1724 LearningRate 0.0011 Epoch: 23 Global Step: 117220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:01:54,725-Speed 3368.90 samples/sec Loss 1.1427 LearningRate 0.0011 Epoch: 23 Global Step: 117230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:01:57,779-Speed 3353.26 samples/sec Loss 1.1380 LearningRate 0.0011 Epoch: 23 Global Step: 117240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:00,806-Speed 3383.94 samples/sec Loss 1.0907 LearningRate 0.0011 Epoch: 23 Global Step: 117250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:03,849-Speed 3366.34 samples/sec Loss 1.1469 LearningRate 0.0011 Epoch: 23 Global Step: 117260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:06,857-Speed 3404.62 samples/sec Loss 1.0708 LearningRate 0.0011 Epoch: 23 Global Step: 117270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:09,859-Speed 3412.55 samples/sec Loss 1.1844 LearningRate 0.0011 Epoch: 23 Global Step: 117280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:12,847-Speed 3428.53 samples/sec Loss 1.1787 LearningRate 0.0010 Epoch: 23 Global Step: 117290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:15,831-Speed 3432.91 samples/sec Loss 1.0877 LearningRate 0.0010 Epoch: 23 Global Step: 117300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:18,802-Speed 3447.25 samples/sec Loss 1.2393 LearningRate 0.0010 Epoch: 23 Global Step: 117310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:21,824-Speed 3389.78 samples/sec Loss 1.1147 LearningRate 0.0010 Epoch: 23 Global Step: 117320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:24,900-Speed 3329.03 samples/sec Loss 1.1854 LearningRate 0.0010 Epoch: 23 Global Step: 117330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:27,966-Speed 3340.43 samples/sec Loss 1.1446 LearningRate 0.0010 Epoch: 23 Global Step: 117340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:30,970-Speed 3410.36 samples/sec Loss 1.0937 LearningRate 0.0010 Epoch: 23 Global Step: 117350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:33,956-Speed 3429.77 samples/sec Loss 1.1723 LearningRate 0.0010 Epoch: 23 Global Step: 117360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:36,939-Speed 3434.30 samples/sec Loss 1.1186 LearningRate 0.0010 Epoch: 23 Global Step: 117370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:39,915-Speed 3441.38 samples/sec Loss 1.2422 LearningRate 0.0010 Epoch: 23 Global Step: 117380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:42,895-Speed 3437.14 samples/sec Loss 1.2000 LearningRate 0.0010 Epoch: 23 Global Step: 117390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:45,891-Speed 3419.41 samples/sec Loss 1.1933 LearningRate 0.0010 Epoch: 23 Global Step: 117400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:48,882-Speed 3423.36 samples/sec Loss 1.1140 LearningRate 0.0010 Epoch: 23 Global Step: 117410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:02:51,888-Speed 3408.46 samples/sec Loss 1.1310 LearningRate 0.0010 Epoch: 23 Global Step: 117420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:54,918-Speed 3379.76 samples/sec Loss 1.1843 LearningRate 0.0010 Epoch: 23 Global Step: 117430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:02:57,907-Speed 3426.79 samples/sec Loss 1.2275 LearningRate 0.0010 Epoch: 23 Global Step: 117440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:00,931-Speed 3388.33 samples/sec Loss 1.1563 LearningRate 0.0010 Epoch: 23 Global Step: 117450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:03,907-Speed 3441.52 samples/sec Loss 1.0781 LearningRate 0.0010 Epoch: 23 Global Step: 117460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:06,890-Speed 3434.58 samples/sec Loss 1.1368 LearningRate 0.0010 Epoch: 23 Global Step: 117470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:09,867-Speed 3440.97 samples/sec Loss 1.2166 LearningRate 0.0010 Epoch: 23 Global Step: 117480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:12,851-Speed 3431.89 samples/sec Loss 1.1479 LearningRate 0.0010 Epoch: 23 Global Step: 117490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:15,874-Speed 3388.34 samples/sec Loss 1.1186 LearningRate 0.0010 Epoch: 23 Global Step: 117500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:18,929-Speed 3352.43 samples/sec Loss 1.1500 LearningRate 0.0010 Epoch: 23 Global Step: 117510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:21,921-Speed 3423.80 samples/sec Loss 1.2541 LearningRate 0.0010 Epoch: 23 Global Step: 117520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:03:24,910-Speed 3426.47 samples/sec Loss 1.1969 LearningRate 0.0010 Epoch: 23 Global Step: 117530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:03:27,890-Speed 3437.49 samples/sec Loss 1.1726 LearningRate 0.0010 Epoch: 23 Global Step: 117540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:03:30,867-Speed 3439.80 samples/sec Loss 1.1425 LearningRate 0.0010 Epoch: 23 Global Step: 117550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:33,929-Speed 3346.22 samples/sec Loss 1.1710 LearningRate 0.0010 Epoch: 23 Global Step: 117560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:36,932-Speed 3410.23 samples/sec Loss 1.1569 LearningRate 0.0010 Epoch: 23 Global Step: 117570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:40,016-Speed 3321.17 samples/sec Loss 1.1612 LearningRate 0.0010 Epoch: 23 Global Step: 117580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:43,054-Speed 3371.94 samples/sec Loss 1.1043 LearningRate 0.0010 Epoch: 23 Global Step: 117590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:46,086-Speed 3377.72 samples/sec Loss 1.1562 LearningRate 0.0010 Epoch: 23 Global Step: 117600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:49,091-Speed 3408.15 samples/sec Loss 1.2030 LearningRate 0.0010 Epoch: 23 Global Step: 117610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:52,081-Speed 3426.63 samples/sec Loss 1.1194 LearningRate 0.0010 Epoch: 23 Global Step: 117620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:55,066-Speed 3431.43 samples/sec Loss 1.1902 LearningRate 0.0010 Epoch: 23 Global Step: 117630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:03:58,048-Speed 3434.73 samples/sec Loss 1.1243 LearningRate 0.0010 Epoch: 23 Global Step: 117640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:04:01,031-Speed 3434.44 samples/sec Loss 1.2054 LearningRate 0.0010 Epoch: 23 Global Step: 117650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:04:04,014-Speed 3433.07 samples/sec Loss 1.1551 LearningRate 0.0010 Epoch: 23 Global Step: 117660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:04:06,996-Speed 3434.76 samples/sec Loss 1.1957 LearningRate 0.0010 Epoch: 23 Global Step: 117670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:04:09,992-Speed 3419.48 samples/sec Loss 1.2239 LearningRate 0.0010 Epoch: 23 Global Step: 117680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:04:12,994-Speed 3411.48 samples/sec Loss 1.1629 LearningRate 0.0010 Epoch: 23 Global Step: 117690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:04:15,976-Speed 3435.32 samples/sec Loss 1.2198 LearningRate 0.0010 Epoch: 23 Global Step: 117700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:04:18,957-Speed 3434.83 samples/sec Loss 1.1941 LearningRate 0.0010 Epoch: 23 Global Step: 117710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:04:21,940-Speed 3434.20 samples/sec Loss 1.1870 LearningRate 0.0010 Epoch: 23 Global Step: 117720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:04:24,935-Speed 3419.63 samples/sec Loss 1.1123 LearningRate 0.0010 Epoch: 23 Global Step: 117730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:04:27,918-Speed 3434.36 samples/sec Loss 1.1823 LearningRate 0.0009 Epoch: 23 Global Step: 117740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:04:30,899-Speed 3436.02 samples/sec Loss 1.1539 LearningRate 0.0009 Epoch: 23 Global Step: 117750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:04:33,880-Speed 3435.80 samples/sec Loss 1.1614 LearningRate 0.0009 Epoch: 23 Global Step: 117760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:04:36,862-Speed 3435.50 samples/sec Loss 1.2265 LearningRate 0.0009 Epoch: 23 Global Step: 117770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:04:39,845-Speed 3433.15 samples/sec Loss 1.1872 LearningRate 0.0009 Epoch: 23 Global Step: 117780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:04:42,828-Speed 3434.07 samples/sec Loss 1.0820 LearningRate 0.0009 Epoch: 23 Global Step: 117790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:04:45,812-Speed 3432.32 samples/sec Loss 1.1625 LearningRate 0.0009 Epoch: 23 Global Step: 117800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:04:48,814-Speed 3411.73 samples/sec Loss 1.1564 LearningRate 0.0009 Epoch: 23 Global Step: 117810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:04:51,801-Speed 3428.78 samples/sec Loss 1.1875 LearningRate 0.0009 Epoch: 23 Global Step: 117820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:04:54,791-Speed 3426.62 samples/sec Loss 1.2229 LearningRate 0.0009 Epoch: 23 Global Step: 117830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:04:57,798-Speed 3406.06 samples/sec Loss 1.1488 LearningRate 0.0009 Epoch: 23 Global Step: 117840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:05:00,894-Speed 3308.20 samples/sec Loss 1.1510 LearningRate 0.0009 Epoch: 23 Global Step: 117850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:05:03,900-Speed 3406.88 samples/sec Loss 1.1460 LearningRate 0.0009 Epoch: 23 Global Step: 117860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:06,885-Speed 3432.66 samples/sec Loss 1.2145 LearningRate 0.0009 Epoch: 23 Global Step: 117870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:09,950-Speed 3341.33 samples/sec Loss 1.1726 LearningRate 0.0009 Epoch: 23 Global Step: 117880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:13,048-Speed 3306.34 samples/sec Loss 1.1840 LearningRate 0.0009 Epoch: 23 Global Step: 117890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:16,106-Speed 3349.40 samples/sec Loss 1.0838 LearningRate 0.0009 Epoch: 23 Global Step: 117900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:19,111-Speed 3407.93 samples/sec Loss 1.2282 LearningRate 0.0009 Epoch: 23 Global Step: 117910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:22,167-Speed 3361.68 samples/sec Loss 1.1364 LearningRate 0.0009 Epoch: 23 Global Step: 117920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:25,160-Speed 3422.93 samples/sec Loss 1.2124 LearningRate 0.0009 Epoch: 23 Global Step: 117930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:28,184-Speed 3386.57 samples/sec Loss 1.1139 LearningRate 0.0009 Epoch: 23 Global Step: 117940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:31,183-Speed 3415.69 samples/sec Loss 1.2253 LearningRate 0.0009 Epoch: 23 Global Step: 117950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:34,204-Speed 3390.19 samples/sec Loss 1.1578 LearningRate 0.0009 Epoch: 23 Global Step: 117960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:05:37,201-Speed 3417.72 samples/sec Loss 1.2863 LearningRate 0.0009 Epoch: 23 Global Step: 117970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:40,300-Speed 3304.36 samples/sec Loss 1.1483 LearningRate 0.0009 Epoch: 23 Global Step: 117980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:43,362-Speed 3345.78 samples/sec Loss 1.1329 LearningRate 0.0009 Epoch: 23 Global Step: 117990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:05:46,346-Speed 3432.65 samples/sec Loss 1.2094 LearningRate 0.0009 Epoch: 23 Global Step: 118000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:06:29,291-[lfw][118000]XNorm: 22.497644 Training: 2022-01-20 05:06:29,291-[lfw][118000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-01-20 05:06:29,292-[lfw][118000]Accuracy-Highest: 0.99833 Training: 2022-01-20 05:07:19,277-[cfp_fp][118000]XNorm: 22.106553 Training: 2022-01-20 05:07:19,278-[cfp_fp][118000]Accuracy-Flip: 0.98957+-0.00495 Training: 2022-01-20 05:07:19,278-[cfp_fp][118000]Accuracy-Highest: 0.98957 Training: 2022-01-20 05:08:02,171-[agedb_30][118000]XNorm: 22.682490 Training: 2022-01-20 05:08:02,171-[agedb_30][118000]Accuracy-Flip: 0.98517+-0.00598 Training: 2022-01-20 05:08:02,172-[agedb_30][118000]Accuracy-Highest: 0.98517 Training: 2022-01-20 05:08:05,159-Speed 73.77 samples/sec Loss 1.1321 LearningRate 0.0009 Epoch: 23 Global Step: 118010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:08,149-Speed 3426.48 samples/sec Loss 1.2017 LearningRate 0.0009 Epoch: 23 Global Step: 118020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:11,158-Speed 3403.65 samples/sec Loss 1.1743 LearningRate 0.0009 Epoch: 23 Global Step: 118030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:14,136-Speed 3439.25 samples/sec Loss 1.1526 LearningRate 0.0009 Epoch: 23 Global Step: 118040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:17,126-Speed 3425.02 samples/sec Loss 1.0951 LearningRate 0.0009 Epoch: 23 Global Step: 118050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:20,134-Speed 3405.80 samples/sec Loss 1.1787 LearningRate 0.0009 Epoch: 23 Global Step: 118060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:23,120-Speed 3430.16 samples/sec Loss 1.2387 LearningRate 0.0009 Epoch: 23 Global Step: 118070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:08:26,101-Speed 3436.78 samples/sec Loss 1.1153 LearningRate 0.0009 Epoch: 23 Global Step: 118080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:29,112-Speed 3401.26 samples/sec Loss 1.2269 LearningRate 0.0009 Epoch: 23 Global Step: 118090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:32,133-Speed 3390.71 samples/sec Loss 1.2029 LearningRate 0.0009 Epoch: 23 Global Step: 118100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:35,166-Speed 3377.54 samples/sec Loss 1.2078 LearningRate 0.0009 Epoch: 23 Global Step: 118110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:38,211-Speed 3363.53 samples/sec Loss 1.1064 LearningRate 0.0009 Epoch: 23 Global Step: 118120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:41,193-Speed 3434.36 samples/sec Loss 1.1949 LearningRate 0.0009 Epoch: 23 Global Step: 118130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:44,178-Speed 3431.18 samples/sec Loss 1.1586 LearningRate 0.0009 Epoch: 23 Global Step: 118140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:47,218-Speed 3369.11 samples/sec Loss 1.2226 LearningRate 0.0009 Epoch: 23 Global Step: 118150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:50,266-Speed 3360.99 samples/sec Loss 1.1762 LearningRate 0.0009 Epoch: 23 Global Step: 118160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:54,143-Speed 2642.50 samples/sec Loss 1.1846 LearningRate 0.0009 Epoch: 23 Global Step: 118170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:08:57,846-Speed 2765.32 samples/sec Loss 1.1776 LearningRate 0.0009 Epoch: 23 Global Step: 118180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:09:00,859-Speed 3399.15 samples/sec Loss 1.1486 LearningRate 0.0009 Epoch: 23 Global Step: 118190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:09:03,899-Speed 3369.52 samples/sec Loss 1.1411 LearningRate 0.0009 Epoch: 23 Global Step: 118200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:09:06,895-Speed 3418.78 samples/sec Loss 1.1801 LearningRate 0.0008 Epoch: 23 Global Step: 118210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:09,905-Speed 3403.02 samples/sec Loss 1.2059 LearningRate 0.0008 Epoch: 23 Global Step: 118220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:12,911-Speed 3407.23 samples/sec Loss 1.1912 LearningRate 0.0008 Epoch: 23 Global Step: 118230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:15,916-Speed 3408.82 samples/sec Loss 1.1415 LearningRate 0.0008 Epoch: 23 Global Step: 118240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:18,908-Speed 3423.84 samples/sec Loss 1.2171 LearningRate 0.0008 Epoch: 23 Global Step: 118250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:21,895-Speed 3428.82 samples/sec Loss 1.1958 LearningRate 0.0008 Epoch: 23 Global Step: 118260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:24,890-Speed 3420.22 samples/sec Loss 1.1318 LearningRate 0.0008 Epoch: 23 Global Step: 118270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:27,988-Speed 3305.56 samples/sec Loss 1.1690 LearningRate 0.0008 Epoch: 23 Global Step: 118280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:31,006-Speed 3394.57 samples/sec Loss 1.1834 LearningRate 0.0008 Epoch: 23 Global Step: 118290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:33,991-Speed 3430.58 samples/sec Loss 1.1560 LearningRate 0.0008 Epoch: 23 Global Step: 118300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:36,977-Speed 3430.76 samples/sec Loss 1.1851 LearningRate 0.0008 Epoch: 23 Global Step: 118310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:09:39,971-Speed 3421.18 samples/sec Loss 1.2460 LearningRate 0.0008 Epoch: 23 Global Step: 118320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:43,008-Speed 3372.46 samples/sec Loss 1.1226 LearningRate 0.0008 Epoch: 23 Global Step: 118330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:46,192-Speed 3216.56 samples/sec Loss 1.0821 LearningRate 0.0008 Epoch: 23 Global Step: 118340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:49,233-Speed 3368.52 samples/sec Loss 1.2355 LearningRate 0.0008 Epoch: 23 Global Step: 118350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:52,215-Speed 3434.44 samples/sec Loss 1.1169 LearningRate 0.0008 Epoch: 23 Global Step: 118360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:55,196-Speed 3437.00 samples/sec Loss 1.1207 LearningRate 0.0008 Epoch: 23 Global Step: 118370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:09:58,191-Speed 3419.48 samples/sec Loss 1.0991 LearningRate 0.0008 Epoch: 23 Global Step: 118380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:01,247-Speed 3351.23 samples/sec Loss 1.2582 LearningRate 0.0008 Epoch: 23 Global Step: 118390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:04,225-Speed 3440.34 samples/sec Loss 1.1175 LearningRate 0.0008 Epoch: 23 Global Step: 118400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:07,238-Speed 3398.66 samples/sec Loss 1.2755 LearningRate 0.0008 Epoch: 23 Global Step: 118410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:10,217-Speed 3438.72 samples/sec Loss 1.1535 LearningRate 0.0008 Epoch: 23 Global Step: 118420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:10:13,205-Speed 3428.65 samples/sec Loss 1.1794 LearningRate 0.0008 Epoch: 23 Global Step: 118430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:10:16,185-Speed 3437.37 samples/sec Loss 1.1944 LearningRate 0.0008 Epoch: 23 Global Step: 118440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:10:19,235-Speed 3357.83 samples/sec Loss 1.1577 LearningRate 0.0008 Epoch: 23 Global Step: 118450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:22,369-Speed 3268.40 samples/sec Loss 1.2385 LearningRate 0.0008 Epoch: 23 Global Step: 118460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:25,404-Speed 3374.76 samples/sec Loss 1.1636 LearningRate 0.0008 Epoch: 23 Global Step: 118470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:28,384-Speed 3438.01 samples/sec Loss 1.1445 LearningRate 0.0008 Epoch: 23 Global Step: 118480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:31,397-Speed 3399.63 samples/sec Loss 1.1936 LearningRate 0.0008 Epoch: 23 Global Step: 118490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:34,386-Speed 3426.32 samples/sec Loss 1.1525 LearningRate 0.0008 Epoch: 23 Global Step: 118500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:37,450-Speed 3342.96 samples/sec Loss 1.1407 LearningRate 0.0008 Epoch: 23 Global Step: 118510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:40,429-Speed 3438.13 samples/sec Loss 1.1660 LearningRate 0.0008 Epoch: 23 Global Step: 118520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:43,413-Speed 3432.91 samples/sec Loss 1.2012 LearningRate 0.0008 Epoch: 23 Global Step: 118530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:46,399-Speed 3430.50 samples/sec Loss 1.2856 LearningRate 0.0008 Epoch: 23 Global Step: 118540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:10:49,380-Speed 3435.58 samples/sec Loss 1.1174 LearningRate 0.0008 Epoch: 23 Global Step: 118550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:10:52,405-Speed 3385.95 samples/sec Loss 1.1489 LearningRate 0.0008 Epoch: 23 Global Step: 118560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:10:55,498-Speed 3312.15 samples/sec Loss 1.1679 LearningRate 0.0008 Epoch: 23 Global Step: 118570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:10:58,668-Speed 3231.49 samples/sec Loss 1.1963 LearningRate 0.0008 Epoch: 23 Global Step: 118580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:11:01,778-Speed 3293.42 samples/sec Loss 1.1697 LearningRate 0.0008 Epoch: 23 Global Step: 118590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:11:04,884-Speed 3297.29 samples/sec Loss 1.1469 LearningRate 0.0008 Epoch: 23 Global Step: 118600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:11:07,869-Speed 3432.22 samples/sec Loss 1.2443 LearningRate 0.0008 Epoch: 23 Global Step: 118610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:11:10,877-Speed 3404.17 samples/sec Loss 1.1641 LearningRate 0.0008 Epoch: 23 Global Step: 118620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:13,907-Speed 3380.51 samples/sec Loss 1.1347 LearningRate 0.0008 Epoch: 23 Global Step: 118630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:16,889-Speed 3435.74 samples/sec Loss 1.0865 LearningRate 0.0008 Epoch: 23 Global Step: 118640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:19,883-Speed 3421.01 samples/sec Loss 1.1553 LearningRate 0.0008 Epoch: 23 Global Step: 118650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:22,896-Speed 3399.05 samples/sec Loss 1.1489 LearningRate 0.0008 Epoch: 23 Global Step: 118660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:25,876-Speed 3437.28 samples/sec Loss 1.2078 LearningRate 0.0008 Epoch: 23 Global Step: 118670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:28,865-Speed 3426.82 samples/sec Loss 1.2220 LearningRate 0.0008 Epoch: 23 Global Step: 118680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:31,848-Speed 3433.91 samples/sec Loss 1.1778 LearningRate 0.0008 Epoch: 23 Global Step: 118690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:34,827-Speed 3438.57 samples/sec Loss 1.2507 LearningRate 0.0008 Epoch: 23 Global Step: 118700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:37,812-Speed 3430.66 samples/sec Loss 1.1693 LearningRate 0.0007 Epoch: 23 Global Step: 118710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:40,800-Speed 3428.10 samples/sec Loss 1.0789 LearningRate 0.0007 Epoch: 23 Global Step: 118720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:11:43,770-Speed 3449.83 samples/sec Loss 1.2727 LearningRate 0.0007 Epoch: 23 Global Step: 118730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:46,750-Speed 3436.39 samples/sec Loss 1.2035 LearningRate 0.0007 Epoch: 23 Global Step: 118740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:49,777-Speed 3383.66 samples/sec Loss 1.1487 LearningRate 0.0007 Epoch: 23 Global Step: 118750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:52,774-Speed 3417.50 samples/sec Loss 1.1467 LearningRate 0.0007 Epoch: 23 Global Step: 118760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:55,762-Speed 3428.05 samples/sec Loss 1.2270 LearningRate 0.0007 Epoch: 23 Global Step: 118770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:11:58,751-Speed 3427.58 samples/sec Loss 1.1508 LearningRate 0.0007 Epoch: 23 Global Step: 118780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:01,747-Speed 3418.63 samples/sec Loss 1.1063 LearningRate 0.0007 Epoch: 23 Global Step: 118790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:04,842-Speed 3309.74 samples/sec Loss 1.1390 LearningRate 0.0007 Epoch: 23 Global Step: 118800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:07,851-Speed 3403.68 samples/sec Loss 1.1818 LearningRate 0.0007 Epoch: 23 Global Step: 118810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:10,858-Speed 3406.65 samples/sec Loss 1.1557 LearningRate 0.0007 Epoch: 23 Global Step: 118820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:13,840-Speed 3435.08 samples/sec Loss 1.2140 LearningRate 0.0007 Epoch: 23 Global Step: 118830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:12:16,941-Speed 3302.65 samples/sec Loss 1.1598 LearningRate 0.0007 Epoch: 23 Global Step: 118840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:12:19,924-Speed 3433.60 samples/sec Loss 1.1815 LearningRate 0.0007 Epoch: 23 Global Step: 118850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:12:22,906-Speed 3434.43 samples/sec Loss 1.1648 LearningRate 0.0007 Epoch: 23 Global Step: 118860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:12:25,892-Speed 3430.39 samples/sec Loss 1.1258 LearningRate 0.0007 Epoch: 23 Global Step: 118870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:12:28,877-Speed 3431.89 samples/sec Loss 1.2297 LearningRate 0.0007 Epoch: 23 Global Step: 118880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:12:31,907-Speed 3380.76 samples/sec Loss 1.1206 LearningRate 0.0007 Epoch: 23 Global Step: 118890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:12:34,896-Speed 3429.11 samples/sec Loss 1.0880 LearningRate 0.0007 Epoch: 23 Global Step: 118900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:38,030-Speed 3267.70 samples/sec Loss 1.2200 LearningRate 0.0007 Epoch: 23 Global Step: 118910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:41,090-Speed 3346.48 samples/sec Loss 1.1744 LearningRate 0.0007 Epoch: 23 Global Step: 118920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:44,088-Speed 3417.19 samples/sec Loss 1.2097 LearningRate 0.0007 Epoch: 23 Global Step: 118930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:47,069-Speed 3435.66 samples/sec Loss 1.1476 LearningRate 0.0007 Epoch: 23 Global Step: 118940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:50,060-Speed 3424.61 samples/sec Loss 1.0936 LearningRate 0.0007 Epoch: 23 Global Step: 118950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:53,046-Speed 3430.32 samples/sec Loss 1.1641 LearningRate 0.0007 Epoch: 23 Global Step: 118960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:56,083-Speed 3372.82 samples/sec Loss 1.1828 LearningRate 0.0007 Epoch: 23 Global Step: 118970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:12:59,205-Speed 3280.57 samples/sec Loss 1.1980 LearningRate 0.0007 Epoch: 23 Global Step: 118980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:02,201-Speed 3418.60 samples/sec Loss 1.2478 LearningRate 0.0007 Epoch: 23 Global Step: 118990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:05,312-Speed 3293.35 samples/sec Loss 1.1201 LearningRate 0.0007 Epoch: 23 Global Step: 119000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:13:08,308-Speed 3417.94 samples/sec Loss 1.2513 LearningRate 0.0007 Epoch: 23 Global Step: 119010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:11,402-Speed 3310.77 samples/sec Loss 1.1326 LearningRate 0.0007 Epoch: 23 Global Step: 119020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:14,399-Speed 3417.19 samples/sec Loss 1.1666 LearningRate 0.0007 Epoch: 23 Global Step: 119030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:17,403-Speed 3410.19 samples/sec Loss 1.2799 LearningRate 0.0007 Epoch: 23 Global Step: 119040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:20,442-Speed 3370.69 samples/sec Loss 1.1381 LearningRate 0.0007 Epoch: 23 Global Step: 119050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:23,458-Speed 3396.34 samples/sec Loss 1.1904 LearningRate 0.0007 Epoch: 23 Global Step: 119060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:26,641-Speed 3217.82 samples/sec Loss 1.1643 LearningRate 0.0007 Epoch: 23 Global Step: 119070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:29,652-Speed 3402.01 samples/sec Loss 1.1225 LearningRate 0.0007 Epoch: 23 Global Step: 119080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:32,640-Speed 3428.28 samples/sec Loss 1.1330 LearningRate 0.0007 Epoch: 23 Global Step: 119090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:35,624-Speed 3432.89 samples/sec Loss 1.1078 LearningRate 0.0007 Epoch: 23 Global Step: 119100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:38,621-Speed 3417.53 samples/sec Loss 1.2440 LearningRate 0.0007 Epoch: 23 Global Step: 119110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:13:41,586-Speed 3454.30 samples/sec Loss 1.1350 LearningRate 0.0007 Epoch: 23 Global Step: 119120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:44,589-Speed 3410.69 samples/sec Loss 1.2146 LearningRate 0.0007 Epoch: 23 Global Step: 119130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:47,581-Speed 3423.56 samples/sec Loss 1.1810 LearningRate 0.0007 Epoch: 23 Global Step: 119140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:50,573-Speed 3423.53 samples/sec Loss 1.2305 LearningRate 0.0007 Epoch: 23 Global Step: 119150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:53,552-Speed 3438.15 samples/sec Loss 1.1374 LearningRate 0.0007 Epoch: 23 Global Step: 119160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:56,532-Speed 3437.46 samples/sec Loss 1.1885 LearningRate 0.0007 Epoch: 23 Global Step: 119170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:13:59,514-Speed 3435.12 samples/sec Loss 1.2031 LearningRate 0.0007 Epoch: 23 Global Step: 119180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:02,503-Speed 3426.52 samples/sec Loss 1.1710 LearningRate 0.0007 Epoch: 23 Global Step: 119190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:05,506-Speed 3410.39 samples/sec Loss 1.1168 LearningRate 0.0007 Epoch: 23 Global Step: 119200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:08,495-Speed 3426.68 samples/sec Loss 1.1137 LearningRate 0.0007 Epoch: 23 Global Step: 119210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:11,532-Speed 3372.73 samples/sec Loss 1.2067 LearningRate 0.0007 Epoch: 23 Global Step: 119220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:14,516-Speed 3433.27 samples/sec Loss 1.1440 LearningRate 0.0007 Epoch: 23 Global Step: 119230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:17,511-Speed 3419.88 samples/sec Loss 1.2006 LearningRate 0.0007 Epoch: 23 Global Step: 119240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:20,516-Speed 3408.28 samples/sec Loss 1.2070 LearningRate 0.0006 Epoch: 23 Global Step: 119250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:23,499-Speed 3433.77 samples/sec Loss 1.1421 LearningRate 0.0006 Epoch: 23 Global Step: 119260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:26,483-Speed 3432.92 samples/sec Loss 1.1363 LearningRate 0.0006 Epoch: 23 Global Step: 119270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:29,473-Speed 3425.60 samples/sec Loss 1.1665 LearningRate 0.0006 Epoch: 23 Global Step: 119280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:32,463-Speed 3425.82 samples/sec Loss 1.2473 LearningRate 0.0006 Epoch: 23 Global Step: 119290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:35,467-Speed 3409.53 samples/sec Loss 1.1012 LearningRate 0.0006 Epoch: 23 Global Step: 119300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:38,468-Speed 3414.09 samples/sec Loss 1.2459 LearningRate 0.0006 Epoch: 23 Global Step: 119310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:14:41,499-Speed 3378.50 samples/sec Loss 1.2060 LearningRate 0.0006 Epoch: 23 Global Step: 119320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:14:44,499-Speed 3414.02 samples/sec Loss 1.0529 LearningRate 0.0006 Epoch: 23 Global Step: 119330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:14:47,483-Speed 3433.60 samples/sec Loss 1.0582 LearningRate 0.0006 Epoch: 23 Global Step: 119340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:14:50,469-Speed 3430.35 samples/sec Loss 1.1711 LearningRate 0.0006 Epoch: 23 Global Step: 119350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:14:53,455-Speed 3430.23 samples/sec Loss 1.2170 LearningRate 0.0006 Epoch: 23 Global Step: 119360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:14:56,467-Speed 3400.34 samples/sec Loss 1.2744 LearningRate 0.0006 Epoch: 23 Global Step: 119370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:14:59,443-Speed 3441.47 samples/sec Loss 1.2195 LearningRate 0.0006 Epoch: 23 Global Step: 119380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:02,514-Speed 3335.26 samples/sec Loss 1.1563 LearningRate 0.0006 Epoch: 23 Global Step: 119390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:05,585-Speed 3335.25 samples/sec Loss 1.1490 LearningRate 0.0006 Epoch: 23 Global Step: 119400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:08,693-Speed 3295.40 samples/sec Loss 1.1737 LearningRate 0.0006 Epoch: 23 Global Step: 119410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:11,687-Speed 3421.33 samples/sec Loss 1.1795 LearningRate 0.0006 Epoch: 23 Global Step: 119420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:14,710-Speed 3389.09 samples/sec Loss 1.1494 LearningRate 0.0006 Epoch: 23 Global Step: 119430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:17,699-Speed 3427.01 samples/sec Loss 1.1223 LearningRate 0.0006 Epoch: 23 Global Step: 119440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:20,699-Speed 3414.02 samples/sec Loss 1.1474 LearningRate 0.0006 Epoch: 23 Global Step: 119450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:23,699-Speed 3414.62 samples/sec Loss 1.1696 LearningRate 0.0006 Epoch: 23 Global Step: 119460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:26,685-Speed 3429.19 samples/sec Loss 1.1280 LearningRate 0.0006 Epoch: 23 Global Step: 119470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:29,826-Speed 3261.87 samples/sec Loss 1.1378 LearningRate 0.0006 Epoch: 23 Global Step: 119480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:15:32,860-Speed 3375.18 samples/sec Loss 1.1069 LearningRate 0.0006 Epoch: 23 Global Step: 119490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:35,854-Speed 3422.30 samples/sec Loss 1.1378 LearningRate 0.0006 Epoch: 23 Global Step: 119500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:38,863-Speed 3403.40 samples/sec Loss 1.1785 LearningRate 0.0006 Epoch: 23 Global Step: 119510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:41,865-Speed 3412.85 samples/sec Loss 1.1031 LearningRate 0.0006 Epoch: 23 Global Step: 119520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:44,852-Speed 3428.60 samples/sec Loss 1.1625 LearningRate 0.0006 Epoch: 23 Global Step: 119530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:47,989-Speed 3265.66 samples/sec Loss 1.1968 LearningRate 0.0006 Epoch: 23 Global Step: 119540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:50,972-Speed 3432.86 samples/sec Loss 1.1265 LearningRate 0.0006 Epoch: 23 Global Step: 119550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:53,990-Speed 3394.78 samples/sec Loss 1.2096 LearningRate 0.0006 Epoch: 23 Global Step: 119560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:15:57,052-Speed 3344.78 samples/sec Loss 1.2339 LearningRate 0.0006 Epoch: 23 Global Step: 119570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:00,045-Speed 3422.84 samples/sec Loss 1.2065 LearningRate 0.0006 Epoch: 23 Global Step: 119580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:03,030-Speed 3430.86 samples/sec Loss 1.1931 LearningRate 0.0006 Epoch: 23 Global Step: 119590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:06,049-Speed 3393.10 samples/sec Loss 1.1289 LearningRate 0.0006 Epoch: 23 Global Step: 119600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:09,049-Speed 3414.64 samples/sec Loss 1.1728 LearningRate 0.0006 Epoch: 23 Global Step: 119610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:12,064-Speed 3397.44 samples/sec Loss 1.1045 LearningRate 0.0006 Epoch: 23 Global Step: 119620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:15,099-Speed 3374.80 samples/sec Loss 1.2223 LearningRate 0.0006 Epoch: 23 Global Step: 119630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:18,125-Speed 3385.19 samples/sec Loss 1.2258 LearningRate 0.0006 Epoch: 23 Global Step: 119640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:21,193-Speed 3338.06 samples/sec Loss 1.1294 LearningRate 0.0006 Epoch: 23 Global Step: 119650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:24,174-Speed 3436.25 samples/sec Loss 1.1861 LearningRate 0.0006 Epoch: 23 Global Step: 119660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:27,205-Speed 3378.74 samples/sec Loss 1.1366 LearningRate 0.0006 Epoch: 23 Global Step: 119670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:30,257-Speed 3357.15 samples/sec Loss 1.1718 LearningRate 0.0006 Epoch: 23 Global Step: 119680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:33,294-Speed 3371.45 samples/sec Loss 1.1879 LearningRate 0.0006 Epoch: 23 Global Step: 119690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:16:36,326-Speed 3378.49 samples/sec Loss 1.0672 LearningRate 0.0006 Epoch: 23 Global Step: 119700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:16:39,355-Speed 3382.45 samples/sec Loss 1.1698 LearningRate 0.0006 Epoch: 23 Global Step: 119710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:16:42,397-Speed 3366.99 samples/sec Loss 1.1197 LearningRate 0.0006 Epoch: 23 Global Step: 119720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:16:45,400-Speed 3410.92 samples/sec Loss 1.1571 LearningRate 0.0006 Epoch: 23 Global Step: 119730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:16:48,432-Speed 3377.95 samples/sec Loss 1.2322 LearningRate 0.0006 Epoch: 23 Global Step: 119740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:51,505-Speed 3332.93 samples/sec Loss 1.2154 LearningRate 0.0006 Epoch: 23 Global Step: 119750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:54,510-Speed 3408.13 samples/sec Loss 1.1582 LearningRate 0.0006 Epoch: 23 Global Step: 119760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:16:57,502-Speed 3424.16 samples/sec Loss 1.0872 LearningRate 0.0006 Epoch: 23 Global Step: 119770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:17:00,470-Speed 3450.26 samples/sec Loss 1.0784 LearningRate 0.0006 Epoch: 23 Global Step: 119780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:17:03,463-Speed 3422.49 samples/sec Loss 1.1525 LearningRate 0.0006 Epoch: 23 Global Step: 119790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:17:06,530-Speed 3339.78 samples/sec Loss 1.1310 LearningRate 0.0006 Epoch: 23 Global Step: 119800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:17:09,612-Speed 3323.52 samples/sec Loss 1.1185 LearningRate 0.0006 Epoch: 23 Global Step: 119810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:17:12,634-Speed 3389.62 samples/sec Loss 1.1887 LearningRate 0.0005 Epoch: 23 Global Step: 119820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:17:15,615-Speed 3435.35 samples/sec Loss 1.1933 LearningRate 0.0005 Epoch: 23 Global Step: 119830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:17:18,602-Speed 3429.04 samples/sec Loss 1.1504 LearningRate 0.0005 Epoch: 23 Global Step: 119840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:17:21,645-Speed 3366.23 samples/sec Loss 1.2711 LearningRate 0.0005 Epoch: 23 Global Step: 119850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:17:24,628-Speed 3434.11 samples/sec Loss 1.1411 LearningRate 0.0005 Epoch: 23 Global Step: 119860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:17:27,622-Speed 3420.28 samples/sec Loss 1.2447 LearningRate 0.0005 Epoch: 23 Global Step: 119870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:17:30,683-Speed 3347.43 samples/sec Loss 1.2662 LearningRate 0.0005 Epoch: 23 Global Step: 119880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:17:33,673-Speed 3424.67 samples/sec Loss 1.1334 LearningRate 0.0005 Epoch: 23 Global Step: 119890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:17:36,691-Speed 3394.38 samples/sec Loss 1.1813 LearningRate 0.0005 Epoch: 23 Global Step: 119900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:17:39,833-Speed 3260.06 samples/sec Loss 1.1669 LearningRate 0.0005 Epoch: 23 Global Step: 119910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:17:42,851-Speed 3393.34 samples/sec Loss 1.2738 LearningRate 0.0005 Epoch: 23 Global Step: 119920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:17:45,841-Speed 3426.08 samples/sec Loss 1.1808 LearningRate 0.0005 Epoch: 23 Global Step: 119930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:17:48,898-Speed 3350.00 samples/sec Loss 1.2322 LearningRate 0.0005 Epoch: 23 Global Step: 119940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:17:52,033-Speed 3267.04 samples/sec Loss 1.1627 LearningRate 0.0005 Epoch: 23 Global Step: 119950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:17:55,028-Speed 3420.07 samples/sec Loss 1.1891 LearningRate 0.0005 Epoch: 23 Global Step: 119960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:17:58,018-Speed 3426.20 samples/sec Loss 1.1722 LearningRate 0.0005 Epoch: 23 Global Step: 119970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:18:01,027-Speed 3403.62 samples/sec Loss 1.2304 LearningRate 0.0005 Epoch: 23 Global Step: 119980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:18:04,033-Speed 3407.98 samples/sec Loss 1.1500 LearningRate 0.0005 Epoch: 23 Global Step: 119990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:18:07,021-Speed 3427.39 samples/sec Loss 1.0991 LearningRate 0.0005 Epoch: 23 Global Step: 120000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:18:49,963-[lfw][120000]XNorm: 22.315873 Training: 2022-01-20 05:18:49,964-[lfw][120000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-01-20 05:18:49,964-[lfw][120000]Accuracy-Highest: 0.99833 Training: 2022-01-20 05:19:39,833-[cfp_fp][120000]XNorm: 22.154320 Training: 2022-01-20 05:19:39,834-[cfp_fp][120000]Accuracy-Flip: 0.98943+-0.00500 Training: 2022-01-20 05:19:39,835-[cfp_fp][120000]Accuracy-Highest: 0.98957 Training: 2022-01-20 05:20:22,665-[agedb_30][120000]XNorm: 22.574880 Training: 2022-01-20 05:20:22,665-[agedb_30][120000]Accuracy-Flip: 0.98517+-0.00617 Training: 2022-01-20 05:20:22,666-[agedb_30][120000]Accuracy-Highest: 0.98517 Training: 2022-01-20 05:20:25,666-Speed 73.86 samples/sec Loss 1.1835 LearningRate 0.0005 Epoch: 23 Global Step: 120010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:20:28,664-Speed 3416.99 samples/sec Loss 1.2018 LearningRate 0.0005 Epoch: 23 Global Step: 120020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:20:31,662-Speed 3416.34 samples/sec Loss 1.1932 LearningRate 0.0005 Epoch: 23 Global Step: 120030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:20:34,642-Speed 3437.17 samples/sec Loss 1.1467 LearningRate 0.0005 Epoch: 23 Global Step: 120040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:20:37,628-Speed 3429.48 samples/sec Loss 1.1024 LearningRate 0.0005 Epoch: 23 Global Step: 120050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:20:40,613-Speed 3432.22 samples/sec Loss 1.1844 LearningRate 0.0005 Epoch: 23 Global Step: 120060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:20:43,595-Speed 3434.19 samples/sec Loss 1.1683 LearningRate 0.0005 Epoch: 23 Global Step: 120070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:20:46,587-Speed 3422.80 samples/sec Loss 1.1953 LearningRate 0.0005 Epoch: 23 Global Step: 120080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:20:49,594-Speed 3406.52 samples/sec Loss 1.1415 LearningRate 0.0005 Epoch: 23 Global Step: 120090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:20:52,576-Speed 3434.58 samples/sec Loss 1.2416 LearningRate 0.0005 Epoch: 23 Global Step: 120100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:20:55,561-Speed 3431.78 samples/sec Loss 1.2339 LearningRate 0.0005 Epoch: 23 Global Step: 120110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:20:58,582-Speed 3390.52 samples/sec Loss 1.1526 LearningRate 0.0005 Epoch: 23 Global Step: 120120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:01,582-Speed 3414.08 samples/sec Loss 1.2039 LearningRate 0.0005 Epoch: 23 Global Step: 120130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:04,601-Speed 3393.64 samples/sec Loss 1.1137 LearningRate 0.0005 Epoch: 23 Global Step: 120140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:07,646-Speed 3363.84 samples/sec Loss 1.1690 LearningRate 0.0005 Epoch: 23 Global Step: 120150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:10,622-Speed 3442.26 samples/sec Loss 1.1353 LearningRate 0.0005 Epoch: 23 Global Step: 120160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:13,622-Speed 3413.12 samples/sec Loss 1.1667 LearningRate 0.0005 Epoch: 23 Global Step: 120170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:16,625-Speed 3410.44 samples/sec Loss 1.2012 LearningRate 0.0005 Epoch: 23 Global Step: 120180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:19,625-Speed 3415.30 samples/sec Loss 1.1973 LearningRate 0.0005 Epoch: 23 Global Step: 120190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:22,639-Speed 3398.19 samples/sec Loss 1.1962 LearningRate 0.0005 Epoch: 23 Global Step: 120200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:25,630-Speed 3424.11 samples/sec Loss 1.1178 LearningRate 0.0005 Epoch: 23 Global Step: 120210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:28,615-Speed 3431.97 samples/sec Loss 1.2291 LearningRate 0.0005 Epoch: 23 Global Step: 120220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:31,604-Speed 3425.95 samples/sec Loss 1.1369 LearningRate 0.0005 Epoch: 23 Global Step: 120230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:34,588-Speed 3433.37 samples/sec Loss 1.1100 LearningRate 0.0005 Epoch: 23 Global Step: 120240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:37,572-Speed 3432.29 samples/sec Loss 1.1520 LearningRate 0.0005 Epoch: 23 Global Step: 120250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:40,633-Speed 3346.28 samples/sec Loss 1.1373 LearningRate 0.0005 Epoch: 23 Global Step: 120260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:21:43,683-Speed 3357.56 samples/sec Loss 1.1728 LearningRate 0.0005 Epoch: 23 Global Step: 120270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:46,732-Speed 3359.58 samples/sec Loss 1.1308 LearningRate 0.0005 Epoch: 23 Global Step: 120280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:49,724-Speed 3423.22 samples/sec Loss 1.0820 LearningRate 0.0005 Epoch: 23 Global Step: 120290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:52,703-Speed 3439.27 samples/sec Loss 1.1447 LearningRate 0.0005 Epoch: 23 Global Step: 120300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:55,721-Speed 3393.78 samples/sec Loss 1.0740 LearningRate 0.0005 Epoch: 23 Global Step: 120310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:21:58,768-Speed 3361.59 samples/sec Loss 1.1485 LearningRate 0.0005 Epoch: 23 Global Step: 120320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:01,778-Speed 3402.09 samples/sec Loss 1.1142 LearningRate 0.0005 Epoch: 23 Global Step: 120330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:04,766-Speed 3432.13 samples/sec Loss 1.0685 LearningRate 0.0005 Epoch: 23 Global Step: 120340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:07,742-Speed 3441.02 samples/sec Loss 1.1218 LearningRate 0.0005 Epoch: 23 Global Step: 120350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:10,735-Speed 3423.17 samples/sec Loss 1.2214 LearningRate 0.0005 Epoch: 23 Global Step: 120360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:13,735-Speed 3414.01 samples/sec Loss 1.1678 LearningRate 0.0005 Epoch: 23 Global Step: 120370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:22:16,717-Speed 3434.19 samples/sec Loss 1.1568 LearningRate 0.0005 Epoch: 23 Global Step: 120380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:22:19,733-Speed 3397.07 samples/sec Loss 1.1229 LearningRate 0.0005 Epoch: 23 Global Step: 120390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:22:22,717-Speed 3432.52 samples/sec Loss 1.1827 LearningRate 0.0005 Epoch: 23 Global Step: 120400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:22:25,723-Speed 3407.84 samples/sec Loss 1.1893 LearningRate 0.0005 Epoch: 23 Global Step: 120410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:28,715-Speed 3423.38 samples/sec Loss 1.1436 LearningRate 0.0005 Epoch: 23 Global Step: 120420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:31,701-Speed 3429.92 samples/sec Loss 1.1627 LearningRate 0.0005 Epoch: 23 Global Step: 120430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:34,684-Speed 3434.03 samples/sec Loss 1.1312 LearningRate 0.0005 Epoch: 23 Global Step: 120440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:37,694-Speed 3402.29 samples/sec Loss 1.1691 LearningRate 0.0005 Epoch: 23 Global Step: 120450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:40,674-Speed 3438.04 samples/sec Loss 1.1959 LearningRate 0.0004 Epoch: 23 Global Step: 120460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:43,698-Speed 3386.41 samples/sec Loss 1.1717 LearningRate 0.0004 Epoch: 23 Global Step: 120470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:46,708-Speed 3402.65 samples/sec Loss 1.1421 LearningRate 0.0004 Epoch: 23 Global Step: 120480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:49,851-Speed 3259.25 samples/sec Loss 1.1161 LearningRate 0.0004 Epoch: 23 Global Step: 120490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:52,843-Speed 3423.84 samples/sec Loss 1.1646 LearningRate 0.0004 Epoch: 23 Global Step: 120500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:55,803-Speed 3459.90 samples/sec Loss 1.2430 LearningRate 0.0004 Epoch: 23 Global Step: 120510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:22:58,777-Speed 3444.01 samples/sec Loss 1.1158 LearningRate 0.0004 Epoch: 23 Global Step: 120520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:01,822-Speed 3364.24 samples/sec Loss 1.2119 LearningRate 0.0004 Epoch: 23 Global Step: 120530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:04,841-Speed 3392.13 samples/sec Loss 1.1665 LearningRate 0.0004 Epoch: 23 Global Step: 120540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:07,824-Speed 3433.80 samples/sec Loss 1.1585 LearningRate 0.0004 Epoch: 23 Global Step: 120550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:10,853-Speed 3382.46 samples/sec Loss 1.1234 LearningRate 0.0004 Epoch: 23 Global Step: 120560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:13,839-Speed 3430.15 samples/sec Loss 1.1790 LearningRate 0.0004 Epoch: 23 Global Step: 120570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:16,821-Speed 3435.34 samples/sec Loss 1.1338 LearningRate 0.0004 Epoch: 23 Global Step: 120580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:19,869-Speed 3360.63 samples/sec Loss 1.1329 LearningRate 0.0004 Epoch: 23 Global Step: 120590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:22,850-Speed 3434.85 samples/sec Loss 1.2289 LearningRate 0.0004 Epoch: 23 Global Step: 120600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:25,867-Speed 3395.36 samples/sec Loss 1.1906 LearningRate 0.0004 Epoch: 23 Global Step: 120610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:23:28,871-Speed 3409.21 samples/sec Loss 1.1282 LearningRate 0.0004 Epoch: 23 Global Step: 120620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:23:31,872-Speed 3413.72 samples/sec Loss 1.1405 LearningRate 0.0004 Epoch: 23 Global Step: 120630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:23:34,841-Speed 3450.00 samples/sec Loss 1.2141 LearningRate 0.0004 Epoch: 23 Global Step: 120640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:37,819-Speed 3439.35 samples/sec Loss 1.1292 LearningRate 0.0004 Epoch: 23 Global Step: 120650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:40,805-Speed 3429.64 samples/sec Loss 1.1204 LearningRate 0.0004 Epoch: 23 Global Step: 120660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:43,840-Speed 3375.93 samples/sec Loss 1.1817 LearningRate 0.0004 Epoch: 23 Global Step: 120670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:46,826-Speed 3429.92 samples/sec Loss 1.1724 LearningRate 0.0004 Epoch: 23 Global Step: 120680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:49,815-Speed 3426.70 samples/sec Loss 1.1511 LearningRate 0.0004 Epoch: 23 Global Step: 120690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:52,806-Speed 3424.36 samples/sec Loss 1.1917 LearningRate 0.0004 Epoch: 23 Global Step: 120700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:55,800-Speed 3422.16 samples/sec Loss 1.2920 LearningRate 0.0004 Epoch: 23 Global Step: 120710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:23:58,894-Speed 3310.08 samples/sec Loss 1.0498 LearningRate 0.0004 Epoch: 23 Global Step: 120720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:24:01,880-Speed 3430.13 samples/sec Loss 1.1838 LearningRate 0.0004 Epoch: 23 Global Step: 120730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:24:04,889-Speed 3404.26 samples/sec Loss 1.1479 LearningRate 0.0004 Epoch: 23 Global Step: 120740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:07,869-Speed 3437.04 samples/sec Loss 1.1477 LearningRate 0.0004 Epoch: 23 Global Step: 120750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:10,850-Speed 3437.38 samples/sec Loss 1.1135 LearningRate 0.0004 Epoch: 23 Global Step: 120760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:13,890-Speed 3369.03 samples/sec Loss 1.2101 LearningRate 0.0004 Epoch: 23 Global Step: 120770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:16,868-Speed 3439.44 samples/sec Loss 1.2384 LearningRate 0.0004 Epoch: 23 Global Step: 120780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:19,852-Speed 3432.54 samples/sec Loss 1.2611 LearningRate 0.0004 Epoch: 23 Global Step: 120790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:22,839-Speed 3428.67 samples/sec Loss 1.2315 LearningRate 0.0004 Epoch: 23 Global Step: 120800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:25,860-Speed 3390.02 samples/sec Loss 1.1517 LearningRate 0.0004 Epoch: 23 Global Step: 120810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:28,842-Speed 3435.61 samples/sec Loss 1.1113 LearningRate 0.0004 Epoch: 23 Global Step: 120820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:31,886-Speed 3364.07 samples/sec Loss 1.1673 LearningRate 0.0004 Epoch: 23 Global Step: 120830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:34,892-Speed 3407.95 samples/sec Loss 1.0775 LearningRate 0.0004 Epoch: 23 Global Step: 120840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:37,914-Speed 3389.08 samples/sec Loss 1.1829 LearningRate 0.0004 Epoch: 23 Global Step: 120850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:40,988-Speed 3331.97 samples/sec Loss 1.1607 LearningRate 0.0004 Epoch: 23 Global Step: 120860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:43,982-Speed 3421.05 samples/sec Loss 1.1581 LearningRate 0.0004 Epoch: 23 Global Step: 120870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:24:46,954-Speed 3446.41 samples/sec Loss 1.1522 LearningRate 0.0004 Epoch: 23 Global Step: 120880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:24:49,974-Speed 3391.47 samples/sec Loss 1.1930 LearningRate 0.0004 Epoch: 23 Global Step: 120890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:24:52,950-Speed 3442.31 samples/sec Loss 1.2352 LearningRate 0.0004 Epoch: 23 Global Step: 120900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:24:55,931-Speed 3435.62 samples/sec Loss 1.1661 LearningRate 0.0004 Epoch: 23 Global Step: 120910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:24:58,919-Speed 3428.46 samples/sec Loss 1.2296 LearningRate 0.0004 Epoch: 23 Global Step: 120920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:25:01,898-Speed 3437.17 samples/sec Loss 1.1256 LearningRate 0.0004 Epoch: 23 Global Step: 120930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:25:05,017-Speed 3285.20 samples/sec Loss 1.1253 LearningRate 0.0004 Epoch: 23 Global Step: 120940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:25:08,040-Speed 3388.46 samples/sec Loss 1.2718 LearningRate 0.0004 Epoch: 23 Global Step: 120950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:25:11,023-Speed 3434.09 samples/sec Loss 1.1597 LearningRate 0.0004 Epoch: 23 Global Step: 120960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:25:14,014-Speed 3424.14 samples/sec Loss 1.2192 LearningRate 0.0004 Epoch: 23 Global Step: 120970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:25:16,992-Speed 3439.45 samples/sec Loss 1.1912 LearningRate 0.0004 Epoch: 23 Global Step: 120980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:25:19,983-Speed 3425.01 samples/sec Loss 1.1062 LearningRate 0.0004 Epoch: 23 Global Step: 120990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-20 05:25:22,987-Speed 3409.38 samples/sec Loss 1.1773 LearningRate 0.0004 Epoch: 23 Global Step: 121000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:25:26,060-Speed 3332.80 samples/sec Loss 1.2167 LearningRate 0.0004 Epoch: 23 Global Step: 121010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:25:29,097-Speed 3373.19 samples/sec Loss 1.1881 LearningRate 0.0004 Epoch: 23 Global Step: 121020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:25:32,084-Speed 3428.82 samples/sec Loss 1.2450 LearningRate 0.0004 Epoch: 23 Global Step: 121030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:25:35,094-Speed 3443.82 samples/sec Loss 1.1912 LearningRate 0.0004 Epoch: 23 Global Step: 121040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:25:38,073-Speed 3437.92 samples/sec Loss 1.1479 LearningRate 0.0004 Epoch: 23 Global Step: 121050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:25:41,086-Speed 3438.62 samples/sec Loss 1.2078 LearningRate 0.0004 Epoch: 23 Global Step: 121060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:25:44,095-Speed 3404.29 samples/sec Loss 1.1593 LearningRate 0.0004 Epoch: 23 Global Step: 121070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:25:47,107-Speed 3400.37 samples/sec Loss 1.0602 LearningRate 0.0004 Epoch: 23 Global Step: 121080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:25:50,145-Speed 3428.17 samples/sec Loss 1.2695 LearningRate 0.0004 Epoch: 23 Global Step: 121090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:25:53,233-Speed 3316.54 samples/sec Loss 1.1727 LearningRate 0.0004 Epoch: 23 Global Step: 121100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:25:56,281-Speed 3431.55 samples/sec Loss 1.1507 LearningRate 0.0004 Epoch: 23 Global Step: 121110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:25:59,323-Speed 3366.79 samples/sec Loss 1.1810 LearningRate 0.0004 Epoch: 23 Global Step: 121120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:02,308-Speed 3430.82 samples/sec Loss 1.1288 LearningRate 0.0004 Epoch: 23 Global Step: 121130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:05,333-Speed 3387.22 samples/sec Loss 1.1791 LearningRate 0.0004 Epoch: 23 Global Step: 121140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:08,398-Speed 3341.66 samples/sec Loss 1.2160 LearningRate 0.0004 Epoch: 23 Global Step: 121150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:11,420-Speed 3388.84 samples/sec Loss 1.2149 LearningRate 0.0004 Epoch: 23 Global Step: 121160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:14,469-Speed 3359.97 samples/sec Loss 1.1682 LearningRate 0.0003 Epoch: 23 Global Step: 121170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:17,541-Speed 3333.47 samples/sec Loss 1.1713 LearningRate 0.0003 Epoch: 23 Global Step: 121180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:20,528-Speed 3430.17 samples/sec Loss 1.1490 LearningRate 0.0003 Epoch: 23 Global Step: 121190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:23,519-Speed 3424.09 samples/sec Loss 1.1634 LearningRate 0.0003 Epoch: 23 Global Step: 121200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:26,543-Speed 3388.14 samples/sec Loss 1.1743 LearningRate 0.0003 Epoch: 23 Global Step: 121210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:29,623-Speed 3324.92 samples/sec Loss 1.1559 LearningRate 0.0003 Epoch: 23 Global Step: 121220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:32,601-Speed 3439.30 samples/sec Loss 1.2021 LearningRate 0.0003 Epoch: 23 Global Step: 121230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:35,581-Speed 3438.13 samples/sec Loss 1.1363 LearningRate 0.0003 Epoch: 23 Global Step: 121240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:38,560-Speed 3437.66 samples/sec Loss 1.3063 LearningRate 0.0003 Epoch: 23 Global Step: 121250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:41,537-Speed 3440.85 samples/sec Loss 1.1331 LearningRate 0.0003 Epoch: 23 Global Step: 121260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:44,526-Speed 3426.79 samples/sec Loss 1.1938 LearningRate 0.0003 Epoch: 23 Global Step: 121270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:47,527-Speed 3413.19 samples/sec Loss 1.1485 LearningRate 0.0003 Epoch: 23 Global Step: 121280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:50,527-Speed 3413.65 samples/sec Loss 1.1808 LearningRate 0.0003 Epoch: 23 Global Step: 121290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:53,522-Speed 3420.08 samples/sec Loss 1.1685 LearningRate 0.0003 Epoch: 23 Global Step: 121300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:56,515-Speed 3422.21 samples/sec Loss 1.1830 LearningRate 0.0003 Epoch: 23 Global Step: 121310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:26:59,503-Speed 3428.80 samples/sec Loss 1.0190 LearningRate 0.0003 Epoch: 23 Global Step: 121320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:27:02,545-Speed 3366.26 samples/sec Loss 1.1765 LearningRate 0.0003 Epoch: 23 Global Step: 121330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:27:05,618-Speed 3333.87 samples/sec Loss 1.0828 LearningRate 0.0003 Epoch: 23 Global Step: 121340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:27:08,613-Speed 3419.15 samples/sec Loss 1.1635 LearningRate 0.0003 Epoch: 23 Global Step: 121350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:27:11,645-Speed 3377.70 samples/sec Loss 1.2036 LearningRate 0.0003 Epoch: 23 Global Step: 121360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:27:14,700-Speed 3353.30 samples/sec Loss 1.2316 LearningRate 0.0003 Epoch: 23 Global Step: 121370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:27:17,797-Speed 3307.60 samples/sec Loss 1.0885 LearningRate 0.0003 Epoch: 23 Global Step: 121380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:27:20,850-Speed 3356.16 samples/sec Loss 1.1898 LearningRate 0.0003 Epoch: 23 Global Step: 121390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:27:35,725-Speed 688.47 samples/sec Loss 1.0757 LearningRate 0.0003 Epoch: 24 Global Step: 121400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:27:38,737-Speed 3400.32 samples/sec Loss 0.9489 LearningRate 0.0003 Epoch: 24 Global Step: 121410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:27:41,760-Speed 3388.93 samples/sec Loss 1.0225 LearningRate 0.0003 Epoch: 24 Global Step: 121420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:27:44,786-Speed 3384.94 samples/sec Loss 1.0019 LearningRate 0.0003 Epoch: 24 Global Step: 121430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:27:47,822-Speed 3373.33 samples/sec Loss 1.0895 LearningRate 0.0003 Epoch: 24 Global Step: 121440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:27:50,816-Speed 3421.33 samples/sec Loss 0.9870 LearningRate 0.0003 Epoch: 24 Global Step: 121450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:27:53,806-Speed 3424.79 samples/sec Loss 0.9892 LearningRate 0.0003 Epoch: 24 Global Step: 121460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:27:56,854-Speed 3361.51 samples/sec Loss 0.9425 LearningRate 0.0003 Epoch: 24 Global Step: 121470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:27:59,861-Speed 3406.99 samples/sec Loss 0.9606 LearningRate 0.0003 Epoch: 24 Global Step: 121480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:28:02,903-Speed 3366.82 samples/sec Loss 1.0080 LearningRate 0.0003 Epoch: 24 Global Step: 121490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:28:05,913-Speed 3404.04 samples/sec Loss 0.9664 LearningRate 0.0003 Epoch: 24 Global Step: 121500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:28:08,923-Speed 3401.91 samples/sec Loss 1.0812 LearningRate 0.0003 Epoch: 24 Global Step: 121510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-20 05:28:11,919-Speed 3418.81 samples/sec Loss 0.9787 LearningRate 0.0003 Epoch: 24 Global Step: 121520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:28:14,974-Speed 3352.98 samples/sec Loss 1.0160 LearningRate 0.0003 Epoch: 24 Global Step: 121530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:28:18,075-Speed 3304.06 samples/sec Loss 1.0504 LearningRate 0.0003 Epoch: 24 Global Step: 121540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:28:21,115-Speed 3369.59 samples/sec Loss 1.0897 LearningRate 0.0003 Epoch: 24 Global Step: 121550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-20 05:28:24,120-Speed 3408.57 samples/sec Loss 1.0045 LearningRate 0.0003 Epoch: 24 Global Step: 121560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:28:27,145-Speed 3385.62 samples/sec Loss 1.0085 LearningRate 0.0003 Epoch: 24 Global Step: 121570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:28:30,154-Speed 3404.47 samples/sec Loss 1.0255 LearningRate 0.0003 Epoch: 24 Global Step: 121580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:28:33,147-Speed 3421.62 samples/sec Loss 1.0627 LearningRate 0.0003 Epoch: 24 Global Step: 121590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:28:36,144-Speed 3419.92 samples/sec Loss 1.0126 LearningRate 0.0003 Epoch: 24 Global Step: 121600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:28:39,134-Speed 3424.59 samples/sec Loss 1.0748 LearningRate 0.0003 Epoch: 24 Global Step: 121610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:28:42,124-Speed 3425.28 samples/sec Loss 0.9636 LearningRate 0.0003 Epoch: 24 Global Step: 121620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:28:45,224-Speed 3304.75 samples/sec Loss 1.0703 LearningRate 0.0003 Epoch: 24 Global Step: 121630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:28:48,221-Speed 3417.42 samples/sec Loss 0.9976 LearningRate 0.0003 Epoch: 24 Global Step: 121640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:28:51,250-Speed 3381.75 samples/sec Loss 0.9793 LearningRate 0.0003 Epoch: 24 Global Step: 121650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:28:54,247-Speed 3417.33 samples/sec Loss 1.0551 LearningRate 0.0003 Epoch: 24 Global Step: 121660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:28:57,316-Speed 3338.01 samples/sec Loss 1.0101 LearningRate 0.0003 Epoch: 24 Global Step: 121670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:00,377-Speed 3346.48 samples/sec Loss 0.9584 LearningRate 0.0003 Epoch: 24 Global Step: 121680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:03,487-Speed 3292.42 samples/sec Loss 1.0744 LearningRate 0.0003 Epoch: 24 Global Step: 121690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:06,507-Speed 3392.60 samples/sec Loss 1.0858 LearningRate 0.0003 Epoch: 24 Global Step: 121700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:09,543-Speed 3373.22 samples/sec Loss 0.9674 LearningRate 0.0003 Epoch: 24 Global Step: 121710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:12,662-Speed 3284.23 samples/sec Loss 1.0544 LearningRate 0.0003 Epoch: 24 Global Step: 121720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:29:15,804-Speed 3260.47 samples/sec Loss 0.9644 LearningRate 0.0003 Epoch: 24 Global Step: 121730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:18,828-Speed 3386.94 samples/sec Loss 1.0257 LearningRate 0.0003 Epoch: 24 Global Step: 121740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:21,900-Speed 3334.35 samples/sec Loss 0.9781 LearningRate 0.0003 Epoch: 24 Global Step: 121750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:24,974-Speed 3331.63 samples/sec Loss 1.0699 LearningRate 0.0003 Epoch: 24 Global Step: 121760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:27,987-Speed 3399.42 samples/sec Loss 1.0322 LearningRate 0.0003 Epoch: 24 Global Step: 121770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:31,005-Speed 3394.60 samples/sec Loss 0.9779 LearningRate 0.0003 Epoch: 24 Global Step: 121780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:33,997-Speed 3423.10 samples/sec Loss 1.0498 LearningRate 0.0003 Epoch: 24 Global Step: 121790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:37,028-Speed 3379.65 samples/sec Loss 1.0029 LearningRate 0.0003 Epoch: 24 Global Step: 121800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:40,098-Speed 3336.43 samples/sec Loss 0.9633 LearningRate 0.0003 Epoch: 24 Global Step: 121810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:43,187-Speed 3315.88 samples/sec Loss 0.9372 LearningRate 0.0003 Epoch: 24 Global Step: 121820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:46,234-Speed 3361.49 samples/sec Loss 1.0176 LearningRate 0.0003 Epoch: 24 Global Step: 121830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:29:49,237-Speed 3410.81 samples/sec Loss 1.0471 LearningRate 0.0003 Epoch: 24 Global Step: 121840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:52,333-Speed 3308.57 samples/sec Loss 0.9194 LearningRate 0.0003 Epoch: 24 Global Step: 121850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:55,351-Speed 3395.13 samples/sec Loss 1.0367 LearningRate 0.0003 Epoch: 24 Global Step: 121860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:29:58,434-Speed 3321.91 samples/sec Loss 1.0183 LearningRate 0.0003 Epoch: 24 Global Step: 121870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:30:01,444-Speed 3403.61 samples/sec Loss 0.9606 LearningRate 0.0003 Epoch: 24 Global Step: 121880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:30:04,453-Speed 3403.80 samples/sec Loss 0.9798 LearningRate 0.0003 Epoch: 24 Global Step: 121890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:30:07,487-Speed 3376.51 samples/sec Loss 1.0670 LearningRate 0.0003 Epoch: 24 Global Step: 121900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:30:10,498-Speed 3401.17 samples/sec Loss 0.9647 LearningRate 0.0003 Epoch: 24 Global Step: 121910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:30:13,519-Speed 3390.61 samples/sec Loss 1.1426 LearningRate 0.0003 Epoch: 24 Global Step: 121920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:30:16,510-Speed 3424.75 samples/sec Loss 0.9655 LearningRate 0.0003 Epoch: 24 Global Step: 121930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:30:19,595-Speed 3320.23 samples/sec Loss 0.9108 LearningRate 0.0003 Epoch: 24 Global Step: 121940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:30:22,615-Speed 3392.07 samples/sec Loss 0.9353 LearningRate 0.0003 Epoch: 24 Global Step: 121950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:30:25,617-Speed 3412.89 samples/sec Loss 0.9909 LearningRate 0.0003 Epoch: 24 Global Step: 121960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:30:28,648-Speed 3378.59 samples/sec Loss 1.0110 LearningRate 0.0003 Epoch: 24 Global Step: 121970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:30:31,670-Speed 3390.08 samples/sec Loss 0.9656 LearningRate 0.0002 Epoch: 24 Global Step: 121980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:30:34,690-Speed 3391.24 samples/sec Loss 0.9850 LearningRate 0.0002 Epoch: 24 Global Step: 121990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:30:37,691-Speed 3413.53 samples/sec Loss 0.8922 LearningRate 0.0002 Epoch: 24 Global Step: 122000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:31:20,694-[lfw][122000]XNorm: 22.289475 Training: 2022-01-20 05:31:20,695-[lfw][122000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-01-20 05:31:20,695-[lfw][122000]Accuracy-Highest: 0.99833 Training: 2022-01-20 05:32:10,251-[cfp_fp][122000]XNorm: 22.233802 Training: 2022-01-20 05:32:10,252-[cfp_fp][122000]Accuracy-Flip: 0.98957+-0.00470 Training: 2022-01-20 05:32:10,253-[cfp_fp][122000]Accuracy-Highest: 0.98957 Training: 2022-01-20 05:32:52,931-[agedb_30][122000]XNorm: 22.616623 Training: 2022-01-20 05:32:52,931-[agedb_30][122000]Accuracy-Flip: 0.98550+-0.00679 Training: 2022-01-20 05:32:52,932-[agedb_30][122000]Accuracy-Highest: 0.98550 Training: 2022-01-20 05:32:55,906-Speed 74.09 samples/sec Loss 1.0525 LearningRate 0.0002 Epoch: 24 Global Step: 122010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:32:58,886-Speed 3436.88 samples/sec Loss 0.9535 LearningRate 0.0002 Epoch: 24 Global Step: 122020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:33:01,841-Speed 3466.59 samples/sec Loss 0.9839 LearningRate 0.0002 Epoch: 24 Global Step: 122030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:33:04,818-Speed 3440.99 samples/sec Loss 1.0054 LearningRate 0.0002 Epoch: 24 Global Step: 122040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:33:07,811-Speed 3421.78 samples/sec Loss 0.9956 LearningRate 0.0002 Epoch: 24 Global Step: 122050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:33:10,832-Speed 3391.50 samples/sec Loss 0.9539 LearningRate 0.0002 Epoch: 24 Global Step: 122060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:33:13,816-Speed 3432.01 samples/sec Loss 1.0123 LearningRate 0.0002 Epoch: 24 Global Step: 122070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:33:16,817-Speed 3413.05 samples/sec Loss 1.0885 LearningRate 0.0002 Epoch: 24 Global Step: 122080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:33:19,879-Speed 3344.95 samples/sec Loss 0.9368 LearningRate 0.0002 Epoch: 24 Global Step: 122090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:33:22,877-Speed 3416.50 samples/sec Loss 0.9628 LearningRate 0.0002 Epoch: 24 Global Step: 122100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:33:25,867-Speed 3426.32 samples/sec Loss 1.0356 LearningRate 0.0002 Epoch: 24 Global Step: 122110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:33:28,859-Speed 3423.49 samples/sec Loss 1.0327 LearningRate 0.0002 Epoch: 24 Global Step: 122120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:33:31,843-Speed 3432.63 samples/sec Loss 0.9639 LearningRate 0.0002 Epoch: 24 Global Step: 122130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:33:34,860-Speed 3395.55 samples/sec Loss 1.0982 LearningRate 0.0002 Epoch: 24 Global Step: 122140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:33:37,898-Speed 3370.96 samples/sec Loss 1.0032 LearningRate 0.0002 Epoch: 24 Global Step: 122150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:33:40,898-Speed 3415.62 samples/sec Loss 0.9964 LearningRate 0.0002 Epoch: 24 Global Step: 122160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:33:43,906-Speed 3405.51 samples/sec Loss 0.9617 LearningRate 0.0002 Epoch: 24 Global Step: 122170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:33:46,883-Speed 3440.85 samples/sec Loss 1.0528 LearningRate 0.0002 Epoch: 24 Global Step: 122180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:33:49,918-Speed 3375.03 samples/sec Loss 1.0681 LearningRate 0.0002 Epoch: 24 Global Step: 122190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:33:52,915-Speed 3417.42 samples/sec Loss 1.0530 LearningRate 0.0002 Epoch: 24 Global Step: 122200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:33:55,913-Speed 3417.76 samples/sec Loss 0.9681 LearningRate 0.0002 Epoch: 24 Global Step: 122210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:33:59,039-Speed 3276.66 samples/sec Loss 0.9955 LearningRate 0.0002 Epoch: 24 Global Step: 122220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:02,128-Speed 3315.41 samples/sec Loss 0.9982 LearningRate 0.0002 Epoch: 24 Global Step: 122230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:34:05,185-Speed 3351.06 samples/sec Loss 1.0101 LearningRate 0.0002 Epoch: 24 Global Step: 122240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:34:08,164-Speed 3438.61 samples/sec Loss 1.0665 LearningRate 0.0002 Epoch: 24 Global Step: 122250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:11,255-Speed 3313.32 samples/sec Loss 0.9789 LearningRate 0.0002 Epoch: 24 Global Step: 122260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:14,241-Speed 3430.30 samples/sec Loss 0.9875 LearningRate 0.0002 Epoch: 24 Global Step: 122270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:17,239-Speed 3417.10 samples/sec Loss 0.9796 LearningRate 0.0002 Epoch: 24 Global Step: 122280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:20,215-Speed 3441.90 samples/sec Loss 0.9509 LearningRate 0.0002 Epoch: 24 Global Step: 122290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:23,212-Speed 3417.62 samples/sec Loss 0.9582 LearningRate 0.0002 Epoch: 24 Global Step: 122300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:26,325-Speed 3289.90 samples/sec Loss 0.9564 LearningRate 0.0002 Epoch: 24 Global Step: 122310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:29,421-Speed 3308.90 samples/sec Loss 1.0330 LearningRate 0.0002 Epoch: 24 Global Step: 122320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:32,599-Speed 3223.04 samples/sec Loss 1.0256 LearningRate 0.0002 Epoch: 24 Global Step: 122330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:35,784-Speed 3215.41 samples/sec Loss 0.9967 LearningRate 0.0002 Epoch: 24 Global Step: 122340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:38,799-Speed 3397.66 samples/sec Loss 0.9632 LearningRate 0.0002 Epoch: 24 Global Step: 122350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:34:41,785-Speed 3430.06 samples/sec Loss 0.9857 LearningRate 0.0002 Epoch: 24 Global Step: 122360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:34:44,769-Speed 3433.40 samples/sec Loss 0.9663 LearningRate 0.0002 Epoch: 24 Global Step: 122370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:34:47,753-Speed 3431.28 samples/sec Loss 0.9870 LearningRate 0.0002 Epoch: 24 Global Step: 122380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:34:50,753-Speed 3415.28 samples/sec Loss 0.9947 LearningRate 0.0002 Epoch: 24 Global Step: 122390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:53,730-Speed 3440.36 samples/sec Loss 0.9721 LearningRate 0.0002 Epoch: 24 Global Step: 122400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:56,715-Speed 3431.34 samples/sec Loss 0.9415 LearningRate 0.0002 Epoch: 24 Global Step: 122410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:34:59,743-Speed 3382.71 samples/sec Loss 1.0562 LearningRate 0.0002 Epoch: 24 Global Step: 122420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:02,736-Speed 3421.90 samples/sec Loss 1.0274 LearningRate 0.0002 Epoch: 24 Global Step: 122430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:05,859-Speed 3280.07 samples/sec Loss 1.0986 LearningRate 0.0002 Epoch: 24 Global Step: 122440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:08,962-Speed 3301.57 samples/sec Loss 1.0215 LearningRate 0.0002 Epoch: 24 Global Step: 122450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:11,936-Speed 3443.60 samples/sec Loss 0.9542 LearningRate 0.0002 Epoch: 24 Global Step: 122460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:14,931-Speed 3420.44 samples/sec Loss 0.9720 LearningRate 0.0002 Epoch: 24 Global Step: 122470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:18,064-Speed 3268.81 samples/sec Loss 1.0385 LearningRate 0.0002 Epoch: 24 Global Step: 122480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:21,049-Speed 3432.47 samples/sec Loss 1.0479 LearningRate 0.0002 Epoch: 24 Global Step: 122490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:35:24,047-Speed 3416.26 samples/sec Loss 0.9993 LearningRate 0.0002 Epoch: 24 Global Step: 122500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:35:27,026-Speed 3437.95 samples/sec Loss 1.0035 LearningRate 0.0002 Epoch: 24 Global Step: 122510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:35:30,054-Speed 3383.65 samples/sec Loss 0.9532 LearningRate 0.0002 Epoch: 24 Global Step: 122520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:35:33,036-Speed 3434.58 samples/sec Loss 0.9832 LearningRate 0.0002 Epoch: 24 Global Step: 122530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:36,022-Speed 3430.86 samples/sec Loss 0.9899 LearningRate 0.0002 Epoch: 24 Global Step: 122540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:39,056-Speed 3376.20 samples/sec Loss 1.0139 LearningRate 0.0002 Epoch: 24 Global Step: 122550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:42,063-Speed 3406.62 samples/sec Loss 1.0101 LearningRate 0.0002 Epoch: 24 Global Step: 122560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:45,183-Speed 3283.55 samples/sec Loss 1.0324 LearningRate 0.0002 Epoch: 24 Global Step: 122570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:48,264-Speed 3323.65 samples/sec Loss 1.0093 LearningRate 0.0002 Epoch: 24 Global Step: 122580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:51,243-Speed 3439.50 samples/sec Loss 0.9694 LearningRate 0.0002 Epoch: 24 Global Step: 122590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:54,250-Speed 3406.38 samples/sec Loss 1.0006 LearningRate 0.0002 Epoch: 24 Global Step: 122600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:35:57,256-Speed 3407.60 samples/sec Loss 1.1146 LearningRate 0.0002 Epoch: 24 Global Step: 122610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:00,292-Speed 3374.02 samples/sec Loss 0.9960 LearningRate 0.0002 Epoch: 24 Global Step: 122620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:03,347-Speed 3352.65 samples/sec Loss 0.9272 LearningRate 0.0002 Epoch: 24 Global Step: 122630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:36:06,343-Speed 3419.02 samples/sec Loss 0.9362 LearningRate 0.0002 Epoch: 24 Global Step: 122640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:36:09,314-Speed 3446.83 samples/sec Loss 0.9823 LearningRate 0.0002 Epoch: 24 Global Step: 122650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:12,297-Speed 3434.03 samples/sec Loss 1.0056 LearningRate 0.0002 Epoch: 24 Global Step: 122660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:15,304-Speed 3407.11 samples/sec Loss 1.0236 LearningRate 0.0002 Epoch: 24 Global Step: 122670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:18,286-Speed 3434.89 samples/sec Loss 1.0508 LearningRate 0.0002 Epoch: 24 Global Step: 122680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:21,317-Speed 3380.04 samples/sec Loss 0.9731 LearningRate 0.0002 Epoch: 24 Global Step: 122690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:24,409-Speed 3313.48 samples/sec Loss 0.9281 LearningRate 0.0002 Epoch: 24 Global Step: 122700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:27,384-Speed 3441.99 samples/sec Loss 1.0446 LearningRate 0.0002 Epoch: 24 Global Step: 122710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:30,384-Speed 3414.80 samples/sec Loss 0.9948 LearningRate 0.0002 Epoch: 24 Global Step: 122720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:33,381-Speed 3418.08 samples/sec Loss 1.0243 LearningRate 0.0002 Epoch: 24 Global Step: 122730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:36,363-Speed 3434.39 samples/sec Loss 1.0498 LearningRate 0.0002 Epoch: 24 Global Step: 122740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:39,360-Speed 3418.11 samples/sec Loss 0.9989 LearningRate 0.0002 Epoch: 24 Global Step: 122750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:36:42,396-Speed 3372.95 samples/sec Loss 0.9774 LearningRate 0.0002 Epoch: 24 Global Step: 122760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:36:45,389-Speed 3422.87 samples/sec Loss 1.0178 LearningRate 0.0002 Epoch: 24 Global Step: 122770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:36:48,384-Speed 3420.04 samples/sec Loss 1.0047 LearningRate 0.0002 Epoch: 24 Global Step: 122780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:36:51,382-Speed 3416.52 samples/sec Loss 0.9827 LearningRate 0.0002 Epoch: 24 Global Step: 122790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:36:54,390-Speed 3404.91 samples/sec Loss 1.0110 LearningRate 0.0002 Epoch: 24 Global Step: 122800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:36:57,459-Speed 3337.72 samples/sec Loss 0.9595 LearningRate 0.0002 Epoch: 24 Global Step: 122810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:37:00,473-Speed 3399.78 samples/sec Loss 0.9763 LearningRate 0.0002 Epoch: 24 Global Step: 122820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:37:03,484-Speed 3402.07 samples/sec Loss 1.0344 LearningRate 0.0002 Epoch: 24 Global Step: 122830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:37:06,469-Speed 3431.16 samples/sec Loss 0.9753 LearningRate 0.0002 Epoch: 24 Global Step: 122840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:37:09,490-Speed 3391.01 samples/sec Loss 1.0539 LearningRate 0.0002 Epoch: 24 Global Step: 122850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:37:12,525-Speed 3375.05 samples/sec Loss 1.0481 LearningRate 0.0002 Epoch: 24 Global Step: 122860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:37:15,534-Speed 3404.73 samples/sec Loss 0.9600 LearningRate 0.0002 Epoch: 24 Global Step: 122870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:37:18,554-Speed 3390.65 samples/sec Loss 1.0577 LearningRate 0.0002 Epoch: 24 Global Step: 122880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:37:21,543-Speed 3426.73 samples/sec Loss 0.9595 LearningRate 0.0002 Epoch: 24 Global Step: 122890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:37:24,581-Speed 3372.25 samples/sec Loss 0.9943 LearningRate 0.0002 Epoch: 24 Global Step: 122900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:37:27,585-Speed 3409.36 samples/sec Loss 0.9652 LearningRate 0.0002 Epoch: 24 Global Step: 122910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:37:30,568-Speed 3434.45 samples/sec Loss 1.0122 LearningRate 0.0002 Epoch: 24 Global Step: 122920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:37:33,597-Speed 3380.73 samples/sec Loss 1.0592 LearningRate 0.0002 Epoch: 24 Global Step: 122930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:37:36,579-Speed 3435.54 samples/sec Loss 0.9952 LearningRate 0.0002 Epoch: 24 Global Step: 122940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:37:39,594-Speed 3397.18 samples/sec Loss 0.9554 LearningRate 0.0002 Epoch: 24 Global Step: 122950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:37:42,588-Speed 3421.47 samples/sec Loss 0.9917 LearningRate 0.0002 Epoch: 24 Global Step: 122960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:37:45,571-Speed 3433.84 samples/sec Loss 1.0495 LearningRate 0.0002 Epoch: 24 Global Step: 122970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:37:48,564-Speed 3422.26 samples/sec Loss 0.9876 LearningRate 0.0002 Epoch: 24 Global Step: 122980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:37:51,537-Speed 3444.63 samples/sec Loss 0.9736 LearningRate 0.0001 Epoch: 24 Global Step: 122990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:37:54,553-Speed 3395.92 samples/sec Loss 0.9922 LearningRate 0.0001 Epoch: 24 Global Step: 123000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:37:57,554-Speed 3413.54 samples/sec Loss 0.9967 LearningRate 0.0001 Epoch: 24 Global Step: 123010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:00,532-Speed 3439.85 samples/sec Loss 0.9947 LearningRate 0.0001 Epoch: 24 Global Step: 123020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:03,527-Speed 3419.38 samples/sec Loss 0.9985 LearningRate 0.0001 Epoch: 24 Global Step: 123030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:06,510-Speed 3433.65 samples/sec Loss 1.0369 LearningRate 0.0001 Epoch: 24 Global Step: 123040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:09,505-Speed 3420.62 samples/sec Loss 1.0713 LearningRate 0.0001 Epoch: 24 Global Step: 123050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:12,504-Speed 3414.37 samples/sec Loss 0.9627 LearningRate 0.0001 Epoch: 24 Global Step: 123060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:15,486-Speed 3435.02 samples/sec Loss 0.9522 LearningRate 0.0001 Epoch: 24 Global Step: 123070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:18,469-Speed 3434.31 samples/sec Loss 1.0503 LearningRate 0.0001 Epoch: 24 Global Step: 123080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:21,451-Speed 3434.64 samples/sec Loss 1.0576 LearningRate 0.0001 Epoch: 24 Global Step: 123090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:38:24,438-Speed 3428.60 samples/sec Loss 1.0784 LearningRate 0.0001 Epoch: 24 Global Step: 123100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:38:27,439-Speed 3413.43 samples/sec Loss 1.0222 LearningRate 0.0001 Epoch: 24 Global Step: 123110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:38:30,411-Speed 3446.44 samples/sec Loss 1.0303 LearningRate 0.0001 Epoch: 24 Global Step: 123120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:33,422-Speed 3402.27 samples/sec Loss 1.0541 LearningRate 0.0001 Epoch: 24 Global Step: 123130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:36,481-Speed 3348.17 samples/sec Loss 0.9982 LearningRate 0.0001 Epoch: 24 Global Step: 123140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:39,499-Speed 3395.06 samples/sec Loss 0.9455 LearningRate 0.0001 Epoch: 24 Global Step: 123150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:42,507-Speed 3404.63 samples/sec Loss 1.0254 LearningRate 0.0001 Epoch: 24 Global Step: 123160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:45,489-Speed 3435.14 samples/sec Loss 0.9734 LearningRate 0.0001 Epoch: 24 Global Step: 123170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:38:48,463-Speed 3443.99 samples/sec Loss 0.9193 LearningRate 0.0001 Epoch: 24 Global Step: 123180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:38:51,527-Speed 3343.25 samples/sec Loss 1.0950 LearningRate 0.0001 Epoch: 24 Global Step: 123190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:38:54,564-Speed 3373.58 samples/sec Loss 0.9640 LearningRate 0.0001 Epoch: 24 Global Step: 123200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:38:57,574-Speed 3402.62 samples/sec Loss 0.9815 LearningRate 0.0001 Epoch: 24 Global Step: 123210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:00,622-Speed 3360.63 samples/sec Loss 0.9783 LearningRate 0.0001 Epoch: 24 Global Step: 123220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:03,671-Speed 3358.71 samples/sec Loss 0.9965 LearningRate 0.0001 Epoch: 24 Global Step: 123230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:06,837-Speed 3235.76 samples/sec Loss 1.0915 LearningRate 0.0001 Epoch: 24 Global Step: 123240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:09,984-Speed 3254.70 samples/sec Loss 1.0556 LearningRate 0.0001 Epoch: 24 Global Step: 123250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:13,012-Speed 3383.14 samples/sec Loss 1.0686 LearningRate 0.0001 Epoch: 24 Global Step: 123260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:16,035-Speed 3411.38 samples/sec Loss 0.9896 LearningRate 0.0001 Epoch: 24 Global Step: 123270 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:19,053-Speed 3394.39 samples/sec Loss 0.9872 LearningRate 0.0001 Epoch: 24 Global Step: 123280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:39:22,036-Speed 3433.11 samples/sec Loss 0.9346 LearningRate 0.0001 Epoch: 24 Global Step: 123290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:39:25,116-Speed 3326.10 samples/sec Loss 0.9159 LearningRate 0.0001 Epoch: 24 Global Step: 123300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:39:28,116-Speed 3414.36 samples/sec Loss 0.9891 LearningRate 0.0001 Epoch: 24 Global Step: 123310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:39:31,119-Speed 3411.10 samples/sec Loss 0.9524 LearningRate 0.0001 Epoch: 24 Global Step: 123320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:39:34,098-Speed 3438.14 samples/sec Loss 0.9550 LearningRate 0.0001 Epoch: 24 Global Step: 123330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:37,121-Speed 3387.62 samples/sec Loss 1.0494 LearningRate 0.0001 Epoch: 24 Global Step: 123340 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:40,118-Speed 3418.11 samples/sec Loss 0.9746 LearningRate 0.0001 Epoch: 24 Global Step: 123350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:43,105-Speed 3429.54 samples/sec Loss 1.0752 LearningRate 0.0001 Epoch: 24 Global Step: 123360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:46,110-Speed 3408.17 samples/sec Loss 1.0165 LearningRate 0.0001 Epoch: 24 Global Step: 123370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:49,108-Speed 3417.21 samples/sec Loss 0.9518 LearningRate 0.0001 Epoch: 24 Global Step: 123380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:52,112-Speed 3408.78 samples/sec Loss 0.9747 LearningRate 0.0001 Epoch: 24 Global Step: 123390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:55,147-Speed 3375.17 samples/sec Loss 1.0737 LearningRate 0.0001 Epoch: 24 Global Step: 123400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:39:58,150-Speed 3411.07 samples/sec Loss 1.0474 LearningRate 0.0001 Epoch: 24 Global Step: 123410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:40:01,203-Speed 3355.04 samples/sec Loss 1.0018 LearningRate 0.0001 Epoch: 24 Global Step: 123420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-20 05:40:04,190-Speed 3429.71 samples/sec Loss 1.0146 LearningRate 0.0001 Epoch: 24 Global Step: 123430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:40:07,180-Speed 3425.24 samples/sec Loss 0.9622 LearningRate 0.0001 Epoch: 24 Global Step: 123440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:40:10,205-Speed 3387.18 samples/sec Loss 0.9854 LearningRate 0.0001 Epoch: 24 Global Step: 123450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:40:13,204-Speed 3414.57 samples/sec Loss 0.9848 LearningRate 0.0001 Epoch: 24 Global Step: 123460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:40:16,179-Speed 3443.52 samples/sec Loss 1.0238 LearningRate 0.0001 Epoch: 24 Global Step: 123470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:40:19,200-Speed 3390.92 samples/sec Loss 0.9248 LearningRate 0.0001 Epoch: 24 Global Step: 123480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:40:22,215-Speed 3397.37 samples/sec Loss 0.9850 LearningRate 0.0001 Epoch: 24 Global Step: 123490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:40:25,344-Speed 3274.25 samples/sec Loss 1.0497 LearningRate 0.0001 Epoch: 24 Global Step: 123500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:40:28,346-Speed 3411.20 samples/sec Loss 0.9556 LearningRate 0.0001 Epoch: 24 Global Step: 123510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:40:31,391-Speed 3364.41 samples/sec Loss 1.0016 LearningRate 0.0001 Epoch: 24 Global Step: 123520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:40:34,376-Speed 3431.37 samples/sec Loss 0.9638 LearningRate 0.0001 Epoch: 24 Global Step: 123530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:40:37,360-Speed 3431.76 samples/sec Loss 1.1031 LearningRate 0.0001 Epoch: 24 Global Step: 123540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:40:40,350-Speed 3427.31 samples/sec Loss 0.9370 LearningRate 0.0001 Epoch: 24 Global Step: 123550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:40:43,338-Speed 3428.47 samples/sec Loss 1.1002 LearningRate 0.0001 Epoch: 24 Global Step: 123560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:40:46,335-Speed 3417.87 samples/sec Loss 1.0556 LearningRate 0.0001 Epoch: 24 Global Step: 123570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:40:49,338-Speed 3410.14 samples/sec Loss 1.0157 LearningRate 0.0001 Epoch: 24 Global Step: 123580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:40:52,321-Speed 3433.34 samples/sec Loss 0.9970 LearningRate 0.0001 Epoch: 24 Global Step: 123590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:40:55,324-Speed 3411.27 samples/sec Loss 0.9343 LearningRate 0.0001 Epoch: 24 Global Step: 123600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:40:58,310-Speed 3430.19 samples/sec Loss 1.0015 LearningRate 0.0001 Epoch: 24 Global Step: 123610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:41:01,598-Speed 3116.15 samples/sec Loss 0.9954 LearningRate 0.0001 Epoch: 24 Global Step: 123620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:41:04,572-Speed 3443.33 samples/sec Loss 0.9823 LearningRate 0.0001 Epoch: 24 Global Step: 123630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:41:07,555-Speed 3433.81 samples/sec Loss 0.9573 LearningRate 0.0001 Epoch: 24 Global Step: 123640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:41:10,572-Speed 3397.74 samples/sec Loss 1.0170 LearningRate 0.0001 Epoch: 24 Global Step: 123650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:41:13,596-Speed 3387.75 samples/sec Loss 1.0110 LearningRate 0.0001 Epoch: 24 Global Step: 123660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:41:16,612-Speed 3396.63 samples/sec Loss 1.0575 LearningRate 0.0001 Epoch: 24 Global Step: 123670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:41:19,669-Speed 3364.52 samples/sec Loss 0.9494 LearningRate 0.0001 Epoch: 24 Global Step: 123680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:41:22,653-Speed 3433.80 samples/sec Loss 0.9700 LearningRate 0.0001 Epoch: 24 Global Step: 123690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:25,733-Speed 3327.17 samples/sec Loss 0.9548 LearningRate 0.0001 Epoch: 24 Global Step: 123700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:28,736-Speed 3410.25 samples/sec Loss 0.9936 LearningRate 0.0001 Epoch: 24 Global Step: 123710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:31,725-Speed 3427.98 samples/sec Loss 1.0024 LearningRate 0.0001 Epoch: 24 Global Step: 123720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:34,842-Speed 3285.63 samples/sec Loss 0.9536 LearningRate 0.0001 Epoch: 24 Global Step: 123730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:37,868-Speed 3384.70 samples/sec Loss 1.0181 LearningRate 0.0001 Epoch: 24 Global Step: 123740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:40,876-Speed 3405.81 samples/sec Loss 0.9478 LearningRate 0.0001 Epoch: 24 Global Step: 123750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:43,864-Speed 3427.90 samples/sec Loss 0.9634 LearningRate 0.0001 Epoch: 24 Global Step: 123760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:46,959-Speed 3309.51 samples/sec Loss 0.9808 LearningRate 0.0001 Epoch: 24 Global Step: 123770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:50,014-Speed 3353.72 samples/sec Loss 0.9864 LearningRate 0.0001 Epoch: 24 Global Step: 123780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:53,001-Speed 3429.03 samples/sec Loss 1.0353 LearningRate 0.0001 Epoch: 24 Global Step: 123790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:55,984-Speed 3433.47 samples/sec Loss 1.0181 LearningRate 0.0001 Epoch: 24 Global Step: 123800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:41:58,983-Speed 3415.73 samples/sec Loss 0.9925 LearningRate 0.0001 Epoch: 24 Global Step: 123810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:42:02,000-Speed 3394.89 samples/sec Loss 1.0678 LearningRate 0.0001 Epoch: 24 Global Step: 123820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:42:04,985-Speed 3432.34 samples/sec Loss 0.9624 LearningRate 0.0001 Epoch: 24 Global Step: 123830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:42:08,059-Speed 3331.22 samples/sec Loss 1.0006 LearningRate 0.0001 Epoch: 24 Global Step: 123840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:42:11,046-Speed 3430.76 samples/sec Loss 0.9811 LearningRate 0.0001 Epoch: 24 Global Step: 123850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:42:14,084-Speed 3371.44 samples/sec Loss 0.9238 LearningRate 0.0001 Epoch: 24 Global Step: 123860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:42:17,092-Speed 3404.51 samples/sec Loss 1.0152 LearningRate 0.0001 Epoch: 24 Global Step: 123870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:42:20,080-Speed 3428.87 samples/sec Loss 1.0054 LearningRate 0.0001 Epoch: 24 Global Step: 123880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:42:23,117-Speed 3372.04 samples/sec Loss 0.9950 LearningRate 0.0001 Epoch: 24 Global Step: 123890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:42:26,104-Speed 3429.80 samples/sec Loss 1.0298 LearningRate 0.0001 Epoch: 24 Global Step: 123900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:42:29,239-Speed 3266.89 samples/sec Loss 1.0889 LearningRate 0.0001 Epoch: 24 Global Step: 123910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:42:32,384-Speed 3257.14 samples/sec Loss 0.9457 LearningRate 0.0001 Epoch: 24 Global Step: 123920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:42:35,372-Speed 3428.85 samples/sec Loss 1.0002 LearningRate 0.0001 Epoch: 24 Global Step: 123930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:42:38,440-Speed 3338.12 samples/sec Loss 0.9762 LearningRate 0.0001 Epoch: 24 Global Step: 123940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:42:41,515-Speed 3331.73 samples/sec Loss 0.9428 LearningRate 0.0001 Epoch: 24 Global Step: 123950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:42:44,501-Speed 3429.85 samples/sec Loss 1.0038 LearningRate 0.0001 Epoch: 24 Global Step: 123960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:42:47,602-Speed 3303.14 samples/sec Loss 0.9726 LearningRate 0.0001 Epoch: 24 Global Step: 123970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:42:50,784-Speed 3219.77 samples/sec Loss 1.1227 LearningRate 0.0001 Epoch: 24 Global Step: 123980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:42:53,793-Speed 3403.30 samples/sec Loss 1.1146 LearningRate 0.0001 Epoch: 24 Global Step: 123990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:42:56,962-Speed 3232.58 samples/sec Loss 0.9888 LearningRate 0.0001 Epoch: 24 Global Step: 124000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:43:40,108-[lfw][124000]XNorm: 22.386551 Training: 2022-01-20 05:43:40,109-[lfw][124000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-01-20 05:43:40,109-[lfw][124000]Accuracy-Highest: 0.99833 Training: 2022-01-20 05:44:29,945-[cfp_fp][124000]XNorm: 22.335605 Training: 2022-01-20 05:44:29,946-[cfp_fp][124000]Accuracy-Flip: 0.98957+-0.00491 Training: 2022-01-20 05:44:29,946-[cfp_fp][124000]Accuracy-Highest: 0.98957 Training: 2022-01-20 05:45:12,829-[agedb_30][124000]XNorm: 22.675243 Training: 2022-01-20 05:45:12,830-[agedb_30][124000]Accuracy-Flip: 0.98467+-0.00657 Training: 2022-01-20 05:45:12,830-[agedb_30][124000]Accuracy-Highest: 0.98550 Training: 2022-01-20 05:45:15,820-Speed 73.74 samples/sec Loss 0.9999 LearningRate 0.0001 Epoch: 24 Global Step: 124010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:45:18,812-Speed 3423.52 samples/sec Loss 0.9860 LearningRate 0.0001 Epoch: 24 Global Step: 124020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:45:21,805-Speed 3421.47 samples/sec Loss 1.0516 LearningRate 0.0001 Epoch: 24 Global Step: 124030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:45:24,847-Speed 3367.94 samples/sec Loss 1.1080 LearningRate 0.0001 Epoch: 24 Global Step: 124040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:45:27,820-Speed 3444.45 samples/sec Loss 1.0119 LearningRate 0.0001 Epoch: 24 Global Step: 124050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:45:30,801-Speed 3437.45 samples/sec Loss 1.1046 LearningRate 0.0001 Epoch: 24 Global Step: 124060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:45:33,781-Speed 3436.46 samples/sec Loss 1.0488 LearningRate 0.0001 Epoch: 24 Global Step: 124070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:45:36,882-Speed 3302.83 samples/sec Loss 0.9892 LearningRate 0.0001 Epoch: 24 Global Step: 124080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:45:39,917-Speed 3376.11 samples/sec Loss 1.0256 LearningRate 0.0001 Epoch: 24 Global Step: 124090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:45:42,910-Speed 3421.51 samples/sec Loss 0.9681 LearningRate 0.0001 Epoch: 24 Global Step: 124100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:45:45,912-Speed 3411.77 samples/sec Loss 1.0805 LearningRate 0.0001 Epoch: 24 Global Step: 124110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:45:48,916-Speed 3410.56 samples/sec Loss 0.9526 LearningRate 0.0001 Epoch: 24 Global Step: 124120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:45:51,976-Speed 3347.41 samples/sec Loss 1.0074 LearningRate 0.0001 Epoch: 24 Global Step: 124130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:45:54,999-Speed 3388.61 samples/sec Loss 0.9376 LearningRate 0.0001 Epoch: 24 Global Step: 124140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:45:57,987-Speed 3428.68 samples/sec Loss 0.9803 LearningRate 0.0001 Epoch: 24 Global Step: 124150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:46:01,005-Speed 3393.11 samples/sec Loss 0.9577 LearningRate 0.0001 Epoch: 24 Global Step: 124160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:04,039-Speed 3376.58 samples/sec Loss 1.0085 LearningRate 0.0001 Epoch: 24 Global Step: 124170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:07,021-Speed 3434.62 samples/sec Loss 1.0207 LearningRate 0.0001 Epoch: 24 Global Step: 124180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:10,059-Speed 3372.02 samples/sec Loss 0.9650 LearningRate 0.0001 Epoch: 24 Global Step: 124190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:13,102-Speed 3366.26 samples/sec Loss 1.0679 LearningRate 0.0001 Epoch: 24 Global Step: 124200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:16,101-Speed 3415.57 samples/sec Loss 0.9300 LearningRate 0.0001 Epoch: 24 Global Step: 124210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:19,165-Speed 3343.06 samples/sec Loss 1.0230 LearningRate 0.0001 Epoch: 24 Global Step: 124220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:22,236-Speed 3335.90 samples/sec Loss 1.0285 LearningRate 0.0001 Epoch: 24 Global Step: 124230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:25,240-Speed 3409.48 samples/sec Loss 1.0022 LearningRate 0.0001 Epoch: 24 Global Step: 124240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:28,406-Speed 3235.84 samples/sec Loss 1.0161 LearningRate 0.0001 Epoch: 24 Global Step: 124250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:31,386-Speed 3437.25 samples/sec Loss 0.9485 LearningRate 0.0001 Epoch: 24 Global Step: 124260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:46:34,385-Speed 3415.30 samples/sec Loss 1.0181 LearningRate 0.0001 Epoch: 24 Global Step: 124270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:46:37,386-Speed 3413.36 samples/sec Loss 1.0107 LearningRate 0.0001 Epoch: 24 Global Step: 124280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:46:40,484-Speed 3306.37 samples/sec Loss 0.9509 LearningRate 0.0001 Epoch: 24 Global Step: 124290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:46:43,630-Speed 3255.23 samples/sec Loss 0.9547 LearningRate 0.0001 Epoch: 24 Global Step: 124300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:46:46,641-Speed 3401.73 samples/sec Loss 1.0865 LearningRate 0.0001 Epoch: 24 Global Step: 124310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:49,691-Speed 3358.70 samples/sec Loss 0.9719 LearningRate 0.0001 Epoch: 24 Global Step: 124320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:52,684-Speed 3421.61 samples/sec Loss 0.9741 LearningRate 0.0001 Epoch: 24 Global Step: 124330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:55,714-Speed 3381.40 samples/sec Loss 0.9427 LearningRate 0.0001 Epoch: 24 Global Step: 124340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:46:58,761-Speed 3361.81 samples/sec Loss 0.9384 LearningRate 0.0001 Epoch: 24 Global Step: 124350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:01,748-Speed 3428.85 samples/sec Loss 1.0181 LearningRate 0.0001 Epoch: 24 Global Step: 124360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:04,727-Speed 3438.22 samples/sec Loss 1.0114 LearningRate 0.0001 Epoch: 24 Global Step: 124370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:07,704-Speed 3439.87 samples/sec Loss 0.9720 LearningRate 0.0001 Epoch: 24 Global Step: 124380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:10,682-Speed 3439.84 samples/sec Loss 1.0553 LearningRate 0.0001 Epoch: 24 Global Step: 124390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:13,685-Speed 3411.45 samples/sec Loss 1.0095 LearningRate 0.0001 Epoch: 24 Global Step: 124400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:16,712-Speed 3383.22 samples/sec Loss 1.0018 LearningRate 0.0001 Epoch: 24 Global Step: 124410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:19,744-Speed 3378.43 samples/sec Loss 1.0044 LearningRate 0.0001 Epoch: 24 Global Step: 124420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:22,723-Speed 3438.86 samples/sec Loss 0.9721 LearningRate 0.0001 Epoch: 24 Global Step: 124430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:25,756-Speed 3376.65 samples/sec Loss 0.9992 LearningRate 0.0001 Epoch: 24 Global Step: 124440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:28,740-Speed 3432.48 samples/sec Loss 1.0138 LearningRate 0.0001 Epoch: 24 Global Step: 124450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:31,828-Speed 3316.86 samples/sec Loss 0.9563 LearningRate 0.0000 Epoch: 24 Global Step: 124460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:34,926-Speed 3307.23 samples/sec Loss 0.9833 LearningRate 0.0000 Epoch: 24 Global Step: 124470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:37,911-Speed 3430.75 samples/sec Loss 1.0474 LearningRate 0.0000 Epoch: 24 Global Step: 124480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:40,893-Speed 3435.51 samples/sec Loss 1.0443 LearningRate 0.0000 Epoch: 24 Global Step: 124490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:43,880-Speed 3429.46 samples/sec Loss 0.9590 LearningRate 0.0000 Epoch: 24 Global Step: 124500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:46,856-Speed 3441.32 samples/sec Loss 0.9617 LearningRate 0.0000 Epoch: 24 Global Step: 124510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:47:49,815-Speed 3462.41 samples/sec Loss 1.0365 LearningRate 0.0000 Epoch: 24 Global Step: 124520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:52,793-Speed 3439.64 samples/sec Loss 0.9626 LearningRate 0.0000 Epoch: 24 Global Step: 124530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:55,779-Speed 3430.47 samples/sec Loss 1.0027 LearningRate 0.0000 Epoch: 24 Global Step: 124540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:47:58,753-Speed 3444.04 samples/sec Loss 1.0265 LearningRate 0.0000 Epoch: 24 Global Step: 124550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:48:01,728-Speed 3442.91 samples/sec Loss 1.0176 LearningRate 0.0000 Epoch: 24 Global Step: 124560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:48:04,917-Speed 3211.81 samples/sec Loss 1.0356 LearningRate 0.0000 Epoch: 24 Global Step: 124570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:48:07,980-Speed 3343.56 samples/sec Loss 0.9313 LearningRate 0.0000 Epoch: 24 Global Step: 124580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:48:10,954-Speed 3444.79 samples/sec Loss 0.9931 LearningRate 0.0000 Epoch: 24 Global Step: 124590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:48:13,943-Speed 3426.34 samples/sec Loss 0.9362 LearningRate 0.0000 Epoch: 24 Global Step: 124600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:48:16,919-Speed 3441.86 samples/sec Loss 0.9455 LearningRate 0.0000 Epoch: 24 Global Step: 124610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:48:19,922-Speed 3411.65 samples/sec Loss 0.9829 LearningRate 0.0000 Epoch: 24 Global Step: 124620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:48:22,932-Speed 3402.35 samples/sec Loss 1.0208 LearningRate 0.0000 Epoch: 24 Global Step: 124630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:48:25,906-Speed 3444.44 samples/sec Loss 1.0525 LearningRate 0.0000 Epoch: 24 Global Step: 124640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:48:28,892-Speed 3430.47 samples/sec Loss 0.9604 LearningRate 0.0000 Epoch: 24 Global Step: 124650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:48:31,872-Speed 3436.90 samples/sec Loss 0.9605 LearningRate 0.0000 Epoch: 24 Global Step: 124660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:48:34,899-Speed 3383.22 samples/sec Loss 0.9803 LearningRate 0.0000 Epoch: 24 Global Step: 124670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:48:37,905-Speed 3407.83 samples/sec Loss 0.9545 LearningRate 0.0000 Epoch: 24 Global Step: 124680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:48:40,899-Speed 3421.56 samples/sec Loss 1.0081 LearningRate 0.0000 Epoch: 24 Global Step: 124690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:48:43,875-Speed 3441.72 samples/sec Loss 1.0893 LearningRate 0.0000 Epoch: 24 Global Step: 124700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:48:46,858-Speed 3434.18 samples/sec Loss 1.0146 LearningRate 0.0000 Epoch: 24 Global Step: 124710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:48:49,816-Speed 3463.11 samples/sec Loss 1.0963 LearningRate 0.0000 Epoch: 24 Global Step: 124720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:48:52,776-Speed 3459.60 samples/sec Loss 1.0394 LearningRate 0.0000 Epoch: 24 Global Step: 124730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:48:55,843-Speed 3341.02 samples/sec Loss 1.0246 LearningRate 0.0000 Epoch: 24 Global Step: 124740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:48:59,026-Speed 3217.14 samples/sec Loss 1.0338 LearningRate 0.0000 Epoch: 24 Global Step: 124750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:02,010-Speed 3433.18 samples/sec Loss 1.1182 LearningRate 0.0000 Epoch: 24 Global Step: 124760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:05,002-Speed 3423.52 samples/sec Loss 1.0108 LearningRate 0.0000 Epoch: 24 Global Step: 124770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:07,981-Speed 3438.51 samples/sec Loss 1.0025 LearningRate 0.0000 Epoch: 24 Global Step: 124780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:10,961-Speed 3436.96 samples/sec Loss 0.9593 LearningRate 0.0000 Epoch: 24 Global Step: 124790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:13,940-Speed 3438.41 samples/sec Loss 0.9841 LearningRate 0.0000 Epoch: 24 Global Step: 124800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:16,916-Speed 3441.71 samples/sec Loss 0.9624 LearningRate 0.0000 Epoch: 24 Global Step: 124810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:19,898-Speed 3434.61 samples/sec Loss 1.0128 LearningRate 0.0000 Epoch: 24 Global Step: 124820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:22,877-Speed 3438.49 samples/sec Loss 1.0237 LearningRate 0.0000 Epoch: 24 Global Step: 124830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:49:25,950-Speed 3334.10 samples/sec Loss 1.0031 LearningRate 0.0000 Epoch: 24 Global Step: 124840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:49:28,969-Speed 3392.69 samples/sec Loss 0.9952 LearningRate 0.0000 Epoch: 24 Global Step: 124850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:49:31,955-Speed 3429.91 samples/sec Loss 0.9842 LearningRate 0.0000 Epoch: 24 Global Step: 124860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:49:35,027-Speed 3334.64 samples/sec Loss 1.0150 LearningRate 0.0000 Epoch: 24 Global Step: 124870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:49:38,175-Speed 3252.99 samples/sec Loss 1.0120 LearningRate 0.0000 Epoch: 24 Global Step: 124880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:41,218-Speed 3367.02 samples/sec Loss 0.9094 LearningRate 0.0000 Epoch: 24 Global Step: 124890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:44,342-Speed 3278.22 samples/sec Loss 1.0068 LearningRate 0.0000 Epoch: 24 Global Step: 124900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:47,468-Speed 3275.92 samples/sec Loss 0.9803 LearningRate 0.0000 Epoch: 24 Global Step: 124910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:50,674-Speed 3198.68 samples/sec Loss 0.9168 LearningRate 0.0000 Epoch: 24 Global Step: 124920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:53,768-Speed 3310.67 samples/sec Loss 1.0368 LearningRate 0.0000 Epoch: 24 Global Step: 124930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:56,752-Speed 3432.80 samples/sec Loss 0.9512 LearningRate 0.0000 Epoch: 24 Global Step: 124940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:49:59,733-Speed 3435.63 samples/sec Loss 1.0129 LearningRate 0.0000 Epoch: 24 Global Step: 124950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:50:02,714-Speed 3436.08 samples/sec Loss 1.0238 LearningRate 0.0000 Epoch: 24 Global Step: 124960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:50:05,711-Speed 3417.72 samples/sec Loss 0.9810 LearningRate 0.0000 Epoch: 24 Global Step: 124970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:50:08,710-Speed 3415.16 samples/sec Loss 1.0962 LearningRate 0.0000 Epoch: 24 Global Step: 124980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:11,687-Speed 3440.36 samples/sec Loss 0.9940 LearningRate 0.0000 Epoch: 24 Global Step: 124990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:14,681-Speed 3421.72 samples/sec Loss 1.0069 LearningRate 0.0000 Epoch: 24 Global Step: 125000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:17,661-Speed 3436.98 samples/sec Loss 1.0415 LearningRate 0.0000 Epoch: 24 Global Step: 125010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:20,664-Speed 3411.91 samples/sec Loss 0.9819 LearningRate 0.0000 Epoch: 24 Global Step: 125020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:23,652-Speed 3426.91 samples/sec Loss 0.9402 LearningRate 0.0000 Epoch: 24 Global Step: 125030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:26,629-Speed 3441.21 samples/sec Loss 0.9537 LearningRate 0.0000 Epoch: 24 Global Step: 125040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:29,622-Speed 3422.52 samples/sec Loss 1.0192 LearningRate 0.0000 Epoch: 24 Global Step: 125050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:32,694-Speed 3333.72 samples/sec Loss 1.0311 LearningRate 0.0000 Epoch: 24 Global Step: 125060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:35,677-Speed 3434.04 samples/sec Loss 1.0108 LearningRate 0.0000 Epoch: 24 Global Step: 125070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:38,705-Speed 3383.61 samples/sec Loss 0.9776 LearningRate 0.0000 Epoch: 24 Global Step: 125080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:41,686-Speed 3435.91 samples/sec Loss 1.0001 LearningRate 0.0000 Epoch: 24 Global Step: 125090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:50:44,663-Speed 3441.00 samples/sec Loss 0.9488 LearningRate 0.0000 Epoch: 24 Global Step: 125100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:50:47,667-Speed 3408.58 samples/sec Loss 1.0374 LearningRate 0.0000 Epoch: 24 Global Step: 125110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:50:50,646-Speed 3439.16 samples/sec Loss 0.9158 LearningRate 0.0000 Epoch: 24 Global Step: 125120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:50:53,626-Speed 3437.78 samples/sec Loss 0.9506 LearningRate 0.0000 Epoch: 24 Global Step: 125130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:50:56,642-Speed 3395.63 samples/sec Loss 1.0239 LearningRate 0.0000 Epoch: 24 Global Step: 125140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:50:59,713-Speed 3335.38 samples/sec Loss 1.0857 LearningRate 0.0000 Epoch: 24 Global Step: 125150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:02,878-Speed 3237.19 samples/sec Loss 0.9620 LearningRate 0.0000 Epoch: 24 Global Step: 125160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:05,896-Speed 3393.57 samples/sec Loss 1.0535 LearningRate 0.0000 Epoch: 24 Global Step: 125170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:08,887-Speed 3424.57 samples/sec Loss 0.9263 LearningRate 0.0000 Epoch: 24 Global Step: 125180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:11,870-Speed 3434.20 samples/sec Loss 1.0820 LearningRate 0.0000 Epoch: 24 Global Step: 125190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:14,890-Speed 3391.69 samples/sec Loss 0.9938 LearningRate 0.0000 Epoch: 24 Global Step: 125200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:51:17,865-Speed 3443.12 samples/sec Loss 1.0854 LearningRate 0.0000 Epoch: 24 Global Step: 125210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:20,848-Speed 3432.95 samples/sec Loss 1.0709 LearningRate 0.0000 Epoch: 24 Global Step: 125220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:23,843-Speed 3420.84 samples/sec Loss 1.0086 LearningRate 0.0000 Epoch: 24 Global Step: 125230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:26,833-Speed 3425.69 samples/sec Loss 1.0537 LearningRate 0.0000 Epoch: 24 Global Step: 125240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:29,950-Speed 3286.25 samples/sec Loss 0.9894 LearningRate 0.0000 Epoch: 24 Global Step: 125250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:32,972-Speed 3388.58 samples/sec Loss 0.9850 LearningRate 0.0000 Epoch: 24 Global Step: 125260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:36,040-Speed 3339.96 samples/sec Loss 0.8956 LearningRate 0.0000 Epoch: 24 Global Step: 125270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:39,028-Speed 3428.79 samples/sec Loss 0.9398 LearningRate 0.0000 Epoch: 24 Global Step: 125280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:42,019-Speed 3424.57 samples/sec Loss 0.9105 LearningRate 0.0000 Epoch: 24 Global Step: 125290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:44,998-Speed 3437.89 samples/sec Loss 0.9590 LearningRate 0.0000 Epoch: 24 Global Step: 125300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:51:47,984-Speed 3429.87 samples/sec Loss 0.9851 LearningRate 0.0000 Epoch: 24 Global Step: 125310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:51:50,968-Speed 3432.93 samples/sec Loss 1.0241 LearningRate 0.0000 Epoch: 24 Global Step: 125320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:51:53,964-Speed 3419.16 samples/sec Loss 0.9804 LearningRate 0.0000 Epoch: 24 Global Step: 125330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:51:57,130-Speed 3235.37 samples/sec Loss 0.9773 LearningRate 0.0000 Epoch: 24 Global Step: 125340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:52:00,200-Speed 3336.00 samples/sec Loss 0.9295 LearningRate 0.0000 Epoch: 24 Global Step: 125350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:52:03,173-Speed 3445.45 samples/sec Loss 0.8599 LearningRate 0.0000 Epoch: 24 Global Step: 125360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:52:06,214-Speed 3368.32 samples/sec Loss 0.9504 LearningRate 0.0000 Epoch: 24 Global Step: 125370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:52:09,214-Speed 3414.70 samples/sec Loss 0.9126 LearningRate 0.0000 Epoch: 24 Global Step: 125380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:52:12,248-Speed 3376.44 samples/sec Loss 0.9406 LearningRate 0.0000 Epoch: 24 Global Step: 125390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:52:15,228-Speed 3437.43 samples/sec Loss 1.0092 LearningRate 0.0000 Epoch: 24 Global Step: 125400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:52:18,215-Speed 3429.77 samples/sec Loss 1.0113 LearningRate 0.0000 Epoch: 24 Global Step: 125410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:52:21,228-Speed 3399.35 samples/sec Loss 0.9665 LearningRate 0.0000 Epoch: 24 Global Step: 125420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:52:24,225-Speed 3417.29 samples/sec Loss 1.0027 LearningRate 0.0000 Epoch: 24 Global Step: 125430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:52:27,291-Speed 3341.22 samples/sec Loss 1.0178 LearningRate 0.0000 Epoch: 24 Global Step: 125440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:52:30,393-Speed 3302.44 samples/sec Loss 1.0326 LearningRate 0.0000 Epoch: 24 Global Step: 125450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:52:33,431-Speed 3371.41 samples/sec Loss 1.0103 LearningRate 0.0000 Epoch: 24 Global Step: 125460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:52:36,417-Speed 3431.69 samples/sec Loss 1.0339 LearningRate 0.0000 Epoch: 24 Global Step: 125470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:52:39,400-Speed 3433.54 samples/sec Loss 1.0609 LearningRate 0.0000 Epoch: 24 Global Step: 125480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:52:42,390-Speed 3425.34 samples/sec Loss 0.9814 LearningRate 0.0000 Epoch: 24 Global Step: 125490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:52:45,373-Speed 3434.86 samples/sec Loss 1.0541 LearningRate 0.0000 Epoch: 24 Global Step: 125500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:52:48,355-Speed 3434.75 samples/sec Loss 0.9630 LearningRate 0.0000 Epoch: 24 Global Step: 125510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:52:51,345-Speed 3425.48 samples/sec Loss 0.9661 LearningRate 0.0000 Epoch: 24 Global Step: 125520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:52:54,340-Speed 3419.50 samples/sec Loss 1.0303 LearningRate 0.0000 Epoch: 24 Global Step: 125530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:52:57,331-Speed 3425.66 samples/sec Loss 0.9563 LearningRate 0.0000 Epoch: 24 Global Step: 125540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:53:00,323-Speed 3424.44 samples/sec Loss 0.9666 LearningRate 0.0000 Epoch: 24 Global Step: 125550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:53:03,331-Speed 3404.56 samples/sec Loss 1.0811 LearningRate 0.0000 Epoch: 24 Global Step: 125560 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-20 05:53:06,282-Speed 3472.52 samples/sec Loss 0.9922 LearningRate 0.0000 Epoch: 24 Global Step: 125570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:09,284-Speed 3411.00 samples/sec Loss 1.0023 LearningRate 0.0000 Epoch: 24 Global Step: 125580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:12,303-Speed 3392.82 samples/sec Loss 1.1015 LearningRate 0.0000 Epoch: 24 Global Step: 125590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:15,303-Speed 3414.56 samples/sec Loss 1.0372 LearningRate 0.0000 Epoch: 24 Global Step: 125600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:18,328-Speed 3386.18 samples/sec Loss 1.0132 LearningRate 0.0000 Epoch: 24 Global Step: 125610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:21,420-Speed 3312.89 samples/sec Loss 1.0658 LearningRate 0.0000 Epoch: 24 Global Step: 125620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:24,409-Speed 3427.14 samples/sec Loss 1.0673 LearningRate 0.0000 Epoch: 24 Global Step: 125630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:27,400-Speed 3424.59 samples/sec Loss 0.9682 LearningRate 0.0000 Epoch: 24 Global Step: 125640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:30,428-Speed 3382.91 samples/sec Loss 0.9926 LearningRate 0.0000 Epoch: 24 Global Step: 125650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:33,414-Speed 3429.67 samples/sec Loss 1.0132 LearningRate 0.0000 Epoch: 24 Global Step: 125660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:36,405-Speed 3425.58 samples/sec Loss 0.9812 LearningRate 0.0000 Epoch: 24 Global Step: 125670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:53:39,403-Speed 3416.26 samples/sec Loss 1.0474 LearningRate 0.0000 Epoch: 24 Global Step: 125680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:53:42,391-Speed 3427.76 samples/sec Loss 1.0399 LearningRate 0.0000 Epoch: 24 Global Step: 125690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:53:45,390-Speed 3415.66 samples/sec Loss 0.9719 LearningRate 0.0000 Epoch: 24 Global Step: 125700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:53:48,365-Speed 3442.72 samples/sec Loss 0.9591 LearningRate 0.0000 Epoch: 24 Global Step: 125710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:51,349-Speed 3432.72 samples/sec Loss 1.0487 LearningRate 0.0000 Epoch: 24 Global Step: 125720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:54,334-Speed 3431.08 samples/sec Loss 0.9506 LearningRate 0.0000 Epoch: 24 Global Step: 125730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:53:57,323-Speed 3426.57 samples/sec Loss 1.0130 LearningRate 0.0000 Epoch: 24 Global Step: 125740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:54:00,329-Speed 3407.60 samples/sec Loss 0.9925 LearningRate 0.0000 Epoch: 24 Global Step: 125750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:54:03,359-Speed 3380.63 samples/sec Loss 1.0313 LearningRate 0.0000 Epoch: 24 Global Step: 125760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:54:06,387-Speed 3382.57 samples/sec Loss 0.9516 LearningRate 0.0000 Epoch: 24 Global Step: 125770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:54:09,401-Speed 3399.04 samples/sec Loss 1.0265 LearningRate 0.0000 Epoch: 24 Global Step: 125780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:54:12,424-Speed 3387.96 samples/sec Loss 1.0096 LearningRate 0.0000 Epoch: 24 Global Step: 125790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:54:15,415-Speed 3424.27 samples/sec Loss 0.9943 LearningRate 0.0000 Epoch: 24 Global Step: 125800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:54:18,471-Speed 3352.03 samples/sec Loss 1.0141 LearningRate 0.0000 Epoch: 24 Global Step: 125810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:21,470-Speed 3428.81 samples/sec Loss 1.0025 LearningRate 0.0000 Epoch: 24 Global Step: 125820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:24,631-Speed 3240.20 samples/sec Loss 0.9735 LearningRate 0.0000 Epoch: 24 Global Step: 125830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:27,835-Speed 3196.68 samples/sec Loss 1.0220 LearningRate 0.0000 Epoch: 24 Global Step: 125840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:30,957-Speed 3280.77 samples/sec Loss 1.0583 LearningRate 0.0000 Epoch: 24 Global Step: 125850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:33,964-Speed 3405.81 samples/sec Loss 1.0162 LearningRate 0.0000 Epoch: 24 Global Step: 125860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:37,055-Speed 3314.24 samples/sec Loss 0.9966 LearningRate 0.0000 Epoch: 24 Global Step: 125870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:40,118-Speed 3343.42 samples/sec Loss 1.0316 LearningRate 0.0000 Epoch: 24 Global Step: 125880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:43,202-Speed 3322.02 samples/sec Loss 0.9785 LearningRate 0.0000 Epoch: 24 Global Step: 125890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:46,194-Speed 3424.06 samples/sec Loss 0.9427 LearningRate 0.0000 Epoch: 24 Global Step: 125900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:49,156-Speed 3457.28 samples/sec Loss 1.0011 LearningRate 0.0000 Epoch: 24 Global Step: 125910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:52,182-Speed 3384.42 samples/sec Loss 1.0313 LearningRate 0.0000 Epoch: 24 Global Step: 125920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:55,167-Speed 3433.07 samples/sec Loss 1.0388 LearningRate 0.0000 Epoch: 24 Global Step: 125930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:54:58,190-Speed 3388.80 samples/sec Loss 0.9687 LearningRate 0.0000 Epoch: 24 Global Step: 125940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:55:01,179-Speed 3426.57 samples/sec Loss 1.0289 LearningRate 0.0000 Epoch: 24 Global Step: 125950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:55:04,164-Speed 3431.13 samples/sec Loss 1.0214 LearningRate 0.0000 Epoch: 24 Global Step: 125960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:55:07,151-Speed 3429.02 samples/sec Loss 0.9759 LearningRate 0.0000 Epoch: 24 Global Step: 125970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:55:10,256-Speed 3302.37 samples/sec Loss 1.0146 LearningRate 0.0000 Epoch: 24 Global Step: 125980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:55:13,243-Speed 3428.79 samples/sec Loss 0.9944 LearningRate 0.0000 Epoch: 24 Global Step: 125990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:55:16,230-Speed 3429.32 samples/sec Loss 0.9898 LearningRate 0.0000 Epoch: 24 Global Step: 126000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:55:59,535-[lfw][126000]XNorm: 22.311306 Training: 2022-01-20 05:55:59,535-[lfw][126000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-01-20 05:55:59,536-[lfw][126000]Accuracy-Highest: 0.99833 Training: 2022-01-20 05:56:49,788-[cfp_fp][126000]XNorm: 22.239278 Training: 2022-01-20 05:56:49,789-[cfp_fp][126000]Accuracy-Flip: 0.98943+-0.00479 Training: 2022-01-20 05:56:49,790-[cfp_fp][126000]Accuracy-Highest: 0.98957 Training: 2022-01-20 05:57:33,106-[agedb_30][126000]XNorm: 22.594836 Training: 2022-01-20 05:57:33,107-[agedb_30][126000]Accuracy-Flip: 0.98550+-0.00663 Training: 2022-01-20 05:57:33,107-[agedb_30][126000]Accuracy-Highest: 0.98550 Training: 2022-01-20 05:57:36,086-Speed 73.22 samples/sec Loss 1.0359 LearningRate 0.0000 Epoch: 24 Global Step: 126010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:57:39,098-Speed 3400.55 samples/sec Loss 0.9353 LearningRate 0.0000 Epoch: 24 Global Step: 126020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:57:42,107-Speed 3404.54 samples/sec Loss 0.9413 LearningRate 0.0000 Epoch: 24 Global Step: 126030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:57:45,092-Speed 3431.89 samples/sec Loss 1.0087 LearningRate 0.0000 Epoch: 24 Global Step: 126040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:57:48,106-Speed 3398.70 samples/sec Loss 0.9740 LearningRate 0.0000 Epoch: 24 Global Step: 126050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:57:51,133-Speed 3384.32 samples/sec Loss 1.0242 LearningRate 0.0000 Epoch: 24 Global Step: 126060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:57:54,133-Speed 3414.30 samples/sec Loss 1.0271 LearningRate 0.0000 Epoch: 24 Global Step: 126070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:57:57,124-Speed 3423.49 samples/sec Loss 1.0461 LearningRate 0.0000 Epoch: 24 Global Step: 126080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:58:00,119-Speed 3420.47 samples/sec Loss 0.9821 LearningRate 0.0000 Epoch: 24 Global Step: 126090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:03,129-Speed 3402.43 samples/sec Loss 0.8997 LearningRate 0.0000 Epoch: 24 Global Step: 126100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:06,162-Speed 3377.23 samples/sec Loss 1.0393 LearningRate 0.0000 Epoch: 24 Global Step: 126110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:09,162-Speed 3414.78 samples/sec Loss 1.0659 LearningRate 0.0000 Epoch: 24 Global Step: 126120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:12,148-Speed 3430.70 samples/sec Loss 0.9851 LearningRate 0.0000 Epoch: 24 Global Step: 126130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:15,136-Speed 3427.96 samples/sec Loss 0.9990 LearningRate 0.0000 Epoch: 24 Global Step: 126140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:18,237-Speed 3303.28 samples/sec Loss 1.0302 LearningRate 0.0000 Epoch: 24 Global Step: 126150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:21,377-Speed 3261.24 samples/sec Loss 0.8985 LearningRate 0.0000 Epoch: 24 Global Step: 126160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:24,369-Speed 3423.36 samples/sec Loss 1.0232 LearningRate 0.0000 Epoch: 24 Global Step: 126170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:27,366-Speed 3417.91 samples/sec Loss 0.9774 LearningRate 0.0000 Epoch: 24 Global Step: 126180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:30,372-Speed 3407.66 samples/sec Loss 1.0272 LearningRate 0.0000 Epoch: 24 Global Step: 126190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:58:33,455-Speed 3322.67 samples/sec Loss 1.0523 LearningRate 0.0000 Epoch: 24 Global Step: 126200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:36,494-Speed 3369.90 samples/sec Loss 1.0307 LearningRate 0.0000 Epoch: 24 Global Step: 126210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:39,516-Speed 3389.74 samples/sec Loss 1.0758 LearningRate 0.0000 Epoch: 24 Global Step: 126220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:42,553-Speed 3373.16 samples/sec Loss 1.0588 LearningRate 0.0000 Epoch: 24 Global Step: 126230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:45,640-Speed 3318.38 samples/sec Loss 1.0387 LearningRate 0.0000 Epoch: 24 Global Step: 126240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:48,626-Speed 3429.85 samples/sec Loss 1.0136 LearningRate 0.0000 Epoch: 24 Global Step: 126250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:51,630-Speed 3409.74 samples/sec Loss 1.0399 LearningRate 0.0000 Epoch: 24 Global Step: 126260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:54,665-Speed 3374.94 samples/sec Loss 1.0170 LearningRate 0.0000 Epoch: 24 Global Step: 126270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:58:57,696-Speed 3379.66 samples/sec Loss 0.9446 LearningRate 0.0000 Epoch: 24 Global Step: 126280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:59:00,747-Speed 3357.96 samples/sec Loss 1.0439 LearningRate 0.0000 Epoch: 24 Global Step: 126290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:59:03,729-Speed 3434.08 samples/sec Loss 0.9990 LearningRate 0.0000 Epoch: 24 Global Step: 126300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:59:06,722-Speed 3422.52 samples/sec Loss 1.0030 LearningRate 0.0000 Epoch: 24 Global Step: 126310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:59:09,772-Speed 3358.32 samples/sec Loss 0.9417 LearningRate 0.0000 Epoch: 24 Global Step: 126320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:59:12,757-Speed 3431.60 samples/sec Loss 1.0104 LearningRate 0.0000 Epoch: 24 Global Step: 126330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:59:15,786-Speed 3381.42 samples/sec Loss 1.0121 LearningRate 0.0000 Epoch: 24 Global Step: 126340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:59:18,776-Speed 3425.62 samples/sec Loss 1.0718 LearningRate 0.0000 Epoch: 24 Global Step: 126350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:59:21,840-Speed 3342.86 samples/sec Loss 0.9829 LearningRate 0.0000 Epoch: 24 Global Step: 126360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-20 05:59:24,882-Speed 3367.59 samples/sec Loss 1.0220 LearningRate 0.0000 Epoch: 24 Global Step: 126370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:59:27,888-Speed 3407.66 samples/sec Loss 0.9877 LearningRate 0.0000 Epoch: 24 Global Step: 126380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:59:30,875-Speed 3428.37 samples/sec Loss 0.9638 LearningRate 0.0000 Epoch: 24 Global Step: 126390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:59:33,848-Speed 3445.65 samples/sec Loss 0.9908 LearningRate 0.0000 Epoch: 24 Global Step: 126400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:59:36,833-Speed 3431.23 samples/sec Loss 1.0176 LearningRate 0.0000 Epoch: 24 Global Step: 126410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:59:39,815-Speed 3436.16 samples/sec Loss 1.1024 LearningRate 0.0000 Epoch: 24 Global Step: 126420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:59:42,811-Speed 3418.64 samples/sec Loss 1.1714 LearningRate 0.0000 Epoch: 24 Global Step: 126430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:59:45,867-Speed 3352.46 samples/sec Loss 1.0469 LearningRate 0.0000 Epoch: 24 Global Step: 126440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-20 05:59:48,833-Speed 3452.55 samples/sec Loss 0.9836 LearningRate 0.0000 Epoch: 24 Global Step: 126450 Fp16 Grad Scale: 32768 Required: -0 hours